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Abstract 


The  VHSIC  Hardware  Description  Language  was  applied  to  the  problem  of  model¬ 
ing  a  VHSIC  class  circuit  being  designed  by  the  VLSI  design  group  at  the  Air  Force 
Institute  of  Technology.  A  methodology  was  defined  to  decompose  and  model  the  circuit 
using  the  hierarchical  facilities  of  the  VHDL.  The  circuit  embeds  the  Winograd  Fourier 
Transform  Algorithm  into  a  pipelined  serial  architecture.  This  architecture  was  modeled 
using  the  VHDL  and  the  C  programming  languages.  A  custom  simulation  tool  was 
developed  to  verify  the  timing,  control  and  hardware  macrocells  used  to  implement  the 
VVFTA  processor.  This  simulation  modeled  the  architecture  at  the  bit  level  and  vali¬ 
dated  the  design. 
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MODELING  AND  SIMULATION  OF  A 
SIGNAL  PROCESSOR  IMPLEMENTING 
THE  WINOGRAD  FOURIER  TRANSFORM 


Chapter  1 
Introduction 


1.1.  Overview 

Continuing  advances  in  the  state-of-the-art  of  silicon  fabrication  technology  have 
allowed  a  tremendous  increase  in  both  the  functionality  and  performance  that  can  be 
achieved  by  a  single  integrated  circuit.  The  natural  counterpart  of  this  increased  func¬ 
tionality  is,  of  course,  increased  design  complexity.  Increased  complexity  limits  the  indi¬ 
vidual  designer’s  ability  to  completely  understand  the  circuit  being  designed.  Thus, 
large  ICs  are  now  developed  by  design  teams,  leading  to  another  problem,  how  to  con¬ 
cisely  and  accurately  communicate  design  information. 

The  formal  language  oriented  approach,  using  hardware  description  languages 
(HDLs),  is  one  method  used  to  describe  and  model  electronic  circuits.  Unfortunately, 
most  HDLs  were  developed  in  a  simpler  time  when  IC  functionality  was  limited  to  small 
and  medium  scale  circuits.  As  we  head  into  the  very  large  scale,  and  very  high  speed 
integrated  circuit  (VLSI.  VHSIC)  era,  there  exists  a  need  to  develop  tools  that  can  both 
model  and  simulate  these  complex  ICs  in  a  concise  and  timely  fashion. 

Once  military  applications  drove  the  state  of  the  art  in  the  electronics  industry. 
Potential  commercial  spinoffs  encouraged  industry  to  pursue  Department  of  Defense 
(DoD)  business,  as  military  Integrated  Circuits  (ICs)  were  sufficiently  general  purpose  to 
be  directly  applicable  to  marketable  products.  As  the  industry  grew,  however,  the  DoD 
share  of  the  total  IC  market  fell  to  under  10°c  11  In  addition,  the  continual  need  to 
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maintain  technological  superiority  over  potential  adversaries  required  ever  more  complex 
special-purpose  circuitry,  driving  the  DoD  into  an  increasingly  specialized  sector  of  the 
marketplace  [11], 

As  military  and  civilian  applications  began  to  diverge,  the  military  driving  toward 
high  speed  signal  processors,  and  the  civilian  market  toward  general  purpose  data  pro¬ 
cessors,  it  became  apparent  to  planners  in  the  DoD  that  industry  could  no  longer  be 
expected  to  develop  ICs  directed  towards  military  applications  in  a  timely  manner. 
Thus,  in  1980,  the  DoD  launched  the  Very  High  Speed  Integrated  Circuit  (VHSIC)  tech¬ 
nology  development  program.  Formulated  as  a  seed  program,  it  was  designed  to  spur 
development  of  technology  directed  towards  military  needs.  It  was  anticipated  that 
once  the  technology  was  available  industry  would  find  civilian  applications  that  would 
complement  future  military  needs.  Major  goals  of  the  VHSIC  program  are  development 
of  technology  necessary  to  produce  submicron  devices,  increased  processing  throughput, 
and  the  formulation  of  new  circuit  design  methodologies  and  computer-aided  design 
(CAD)  tools  required  for  maximum  exploitation  of  the  new  technology  [16], 

Insertion  of  the  new  technology  into  existing  weapons  systems  is  considered  a  prior¬ 
ity  goal.  The  reduction  of  system  size,  weight,  and  power  requirements  using  the  new 
VHSIC  class  ICs  over  systems  using  current  technology  is  expected  to  decrease  the  cost 
and  increase  the  reliability/maintainability  of  the  new  systems.  The  VHSIC  program 
office  plans  to  demonstrate  the  replacement  of  over  50  ICs  in  current  systems  with  one 
VHSIC  chip.  This  implies  that  the  VHSIC  chip  could  have  upwards  of  250,000  logic 
gates,  an  extremely  complicated  part  to  design  and  validate.  Modeling  and  simulation  of 
a  circuit  of  this  complexity  could  easily  be  on  the  critical  path  towards  a  correct  imple¬ 
mentation  of  the  intended  function.  However,  current  simulation  languages  are  not 
capable  of  simulating  large  circuit  designs  in  a  timely  manner. 

After  surveying  existing  Hardware  Description  Languages  (IIDLs),  the  VHSIC  pro¬ 
gram  office  decided  none  would  adequately  meet,  its  projected  requirements  and  thus 
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funded  the  development  of  a  VHSIC  HDL  (VHDL)  to  meet  both  present  and  anticipated 
applications.  VHDL,  based  on  Ada,  the  new  DoD  standard  High  Order  Language,  incor¬ 
porates  VHSIC  specific  requirements  such  as  portability,  maintainability,  timing,  and  the 
ability  to  do  hierarchical  modeling  and  simulation.  VHDL  is  now  in  its  final  design 
stage.  A  test  version  of  the  VHDL  simulator  is  scheduled  to  be  made  available  to  the 
Air  Force  Institute  of  Technology  (AFIT)  for  beta-site  testing  during  the  spring  of  1986. 
Although  VHDL  has  been  designated  as  the  DoD  standard  HDL  for  VHSIC  circuitry,  a 
significant  amount  of  work  remains  to  evaluate  the  language  for  its  ease  of  use  and  clar¬ 
ity  of  syntax  in  the  description  of  a  VHSIC  class  chip. 

1.2.  Digital  Signal  Processing 

Signal  processing  involves  Fourier  series  analysis  of  continuous  or  discrete  time- 
varying  signals.  With  the  advent  of  large-scale  integration  of  digital  systems,  it  became 
practical  to  implement  complex  signal  processing  functions  on  a  single  substrate.  Sys¬ 
tem  designers  began  to  foresee  applications  requiring  Fourier  analysis  which  were  previ¬ 
ously  infeasible  due  to  size  and/or  speed  limitations  of  available  analog  or  digital  sys¬ 
tems.  Some  current  systems  use  Fourier  analysis  as  the  basis  for  pattern  recognition  sys¬ 
tems.  A  time  domain  picture  is  taken  and  converted  into  the  frequency  domain  by  a 
fast  Fourier  transform  (FFT)  algorithm.  Results  are  compared  with  a  prestored  spec¬ 
trum  to  determine  identity  of  the  objects  in  the  field  of  view.  Fourier  analysis  of  seismic 
feedback  from  explosions  is  a  primary  method  of  searching  for  petroleum  deposits. 
Sonar  detection  of  enemy  submarines  through  processing  of  signal  returns  is  another 
important  defense  application.  Future  applications  include  not  only  enhancements  of 
current  implementations  but  also  many  potential  applications  not  currently  feasible  due 
to  speed  and  size  limitations  of  current  technology.  For  example,  a  digital  front-end  for 
a  phased-array  radar,  real-time  computer  resolution  of  satellite  imagery,  and  medical 
needs,  such  as  pictorial  representation  of  internal  body  organs  through  low-level  X-ray 
tomography,  would  benefit  from  more  processing  power  than  is  available  using  today’s 

-3- 


technology  [1].  Advances  in  device  technology  must  be  matched  with  clever  algorithmic 
design  to  reduce  the  computational  burden  in  order  to  bring  these  applications  into  the 
realm  of  feasibility. 

1.3.  Winograd  FFT 

The  Winograd  Fourier  Transform  Algorithm  (WFTA)  is  a  method  for  implement¬ 
ing  a  Discrete  Fourier  Transform  (DFT)  for  signal  processing.  It  offers  the  potential  for 
a  tenfold  increase  in  processing  throughput  over  existing  signal  processing  algorithms.  A 
group  of  AFIT  graduate  students  is  designing  a  WFTA  processor  that  will  be  imple¬ 
mented  using  1.2 /i  CMOS  technology  similar  to  that  developed  in  VHSIC  Phase  I. 

1.4.  Statement  of  the  Problem 

The  problem  addressed  in  this  thesis  is  to  analyze  the  effectiveness  of  the  VHSIC 
Hardware  Description  Language  (VHDL)  for  modeling  large  CMOS  integrated  circuits, 
and  to  verify  the  architecture,  data  flow,  and  control  sequencing  of  the  16-point  Wino¬ 
grad  FFT  signal  processor. 

The  major  portion  of  the  research  is  directed  toward  analysis  of  VHDL  as  a  tool 
useful  in  VLSI  design.  This  analysis  covered  learning  the  language  syntax,  development 
of  a  methodology  to  be  used  for  VHDL  modeling,  and  modeling  the  primary  CMOS  cir¬ 
cuits  that  make  up  the  WFTA  processor.  In  support  of  the  WFTA  verification  effort,  a 
model  of  the  16  point  architecture  was  developed  using  the  C  programming  language. 
This  model  completely  describes  the  arithmetic  and  control  functions  of  the  processor  at 
the  bit  level.  It  verified  correct  operation  of  the  algorithmic  implementation,  and  was 
exercised  to  generate  test  vectors  for  future  VHDL  simulations  and  hardware  testing. 

1.5.  Problem  Environment 

The  research  reported  in  this  thesis  is  one  of  four  related  efforts  working  toward  the 
design  and  implementation  of  VLSI  signal  processors  that  implement  the  Winograd 
Fourier  Transform.  Captain  Kent  Taylor  [17]  developed  the  architecture  of  the  WFTA 


chip  from  the  original  concept  developed  by  Linderman  [8].  Taylor’s  thesis  covers 
theoretical  development,  numerical  performance,  and  control  and  timing  details  of  the 
processors.  He  developed  and  validated  programs  that  performed  FFTs  using  the  15,  16, 
and  17  point  Winograd  algorithms.  Captain  Paul  Rossbach  [13]  designed  and  imple¬ 
mented  the  control  portion  of  the  WFTA  chip.  An  interim  control  sequencer  test  chip 
was  designed,  fabricated,  and  tested  at  clock  rates  exceeding  50Mhz.  He  also  designed 
and  implemented  a  X  (shaped  storage  cell)  Read  Only  Memory  (XROM)  to  provide  the 
data  addresses  to  an  off-chip  Random  Access  Memories  (RAM)  in  the  order  specified  by 
the  Chinese  Remainder  Theorem.  The  XROM  has  been  optimized  to  minimize  the 
number  of  transistors  by  a  solution  to  both  the  graph  partitioning  and  the  traveling 
salesman  problems  using  the  approach  of  Kernighan  and  Lin  [7],  Finally  a  silicon  com¬ 
piler  was  written  to  automatically  place  the  address  sequencing  scheme  into  the  XROM 
personalization  mask.  Captain  Paul  Coutee  [4]  developed  and  implemented  the  serial 
adders  and  multipliers  used  in  the  processor’s  arithmetic  section.  The  multipliers  are 
derived  from  Lyon’s  serial  multiplier  architecture,  but  redesigned  to  use  fixed  coefficients 
[9].  In  addition,  the  horizontal  and  vertical  pitch  was  minimized.  The  resulting  dense 
cell  structure  is  critical  towards  achieving  the  goal  of  an  entire  Winograd  processor  on  a 
single  silicon  chip.  In  addition,  cells  were  designed  to  check  and  generate  parity  and  to 
perform  arithmetic  rounding  of  the  results. 

1.6.  Summary  of  Current  Knowledge 

Hardware  Description  Languages  are  not  a  new  item.  As  early  as  1939,  Shannon 
used  a  type  of  HDL  in  his  work  on  switching  circuits  [8].  Nor  are  they  rare.  In  a  special 
IEEE  issue  on  HDLs  Liposki  noted  that  whenever  someone  developed  a  circuit  simulator 
they  felt  compelled  to  develop  a  HDL  to  drive  it  rather  than  learn  and  adapt  an  existing 
one  to  their  application  [8].  In  other  special  issues  on  HDLs  by  the  IEEE  Computer 
Society,  writers  have  called  for  a  common  HDL  [3],  It  was  noted  that  although  there 
were  many  languages  that  were  adequate  for  a  specific  purpose,  none  were  suitable  for 


application  over  the  entire  range  of  a  large  hardware  design  project.  The  IEEE  has 
sponsored  project  CONLAN  (CONsenus  LANguage)  to  develop  a  group  of  languages 
linked  by  common  syntax  and  design  conventions.  The  new  language  would  use  desir¬ 
able  features  and  concepts  from  the  myriad  existing  languages  and  incorporate  these  into 
its  basic  syntax  Base  CONLAN  [12]. 

Around  the  same  time  the  DoD,  faced  with  an  explosion  in  the  number  of  software 
languages  in  its  computer  systems,  launched  an  effort  to  slow  the  growth  of  the  cost  of 
software  maintenance.  After  studying  the  problem  the  DoD  concluded  that  computer 
languages  had  not  kept  pace  with  the  advances  in  technology.  Accordingly  an  effort  was 
made  to  develop  a  language  incorporating  both  features  in  current  languages  and 
modern  concepts  in  software  engineering  such  as  structured  programming,  information 
hiding,  data  abstraction,  real  time  control  and  data  handling.  The  result  was  the  Ada 
Programming  Language  which  has  been  designated  as  the  standard  DoD  High  Order 
Language  [2].  The  VHSIC  program  office,  looking  at  the  problem  of  concisely  communi¬ 
cating  design  information  on  integrated  circuits  containing  up  to  250,000  gates,  recog¬ 
nized  that  the  basic  concepts  and  constructs  used  in  Ada  could  be  used  in  a  new  HDL. 
The  relationships  between  VHDL  and  Ada  are  detailed  in  the  VHDL  Design  Analysis 
and  Justification  [6],  In  general  VHDL  constructs  supported  by  Ada  were  required  to 
use  the  Ada  syntax  [6].  The  basic  objectives  of  the  VHDL  are: 

1.  It  be  capable  of  documenting  digital  hardware  over  the 
range  of  entire  systems  to  logical  gates. 

2.  It  be  able  to  be  used  as  a  design  and  documentation 
tool 

3.  Its  complexity  be  kept  to  a  minimum. 

A  contractor  team  of  Intermetrics,  IBM,  and  Texas  Instruments  was  selected  to  develop 
the  VHDL.  The  contract  was  for  a  two-phase  design  effort  followed  by  a  testing  phase. 
AFIT  was  selected  as  a  test  site  to  determine  if  VHDL  meets  the  requirements  set  forth 
in  the  requirements  documents,  and  if  the  VHDL  is  a  practical  tool  for  use  in  VLSI 


design  development. 


1.7.  Approach 

As  has  been  stated,  the  main  thrust  of  this  thesis  effort  has  been  to  analyze  the 
effectiveness  of  the  VHSIC  Hardware  Description  Language  (VHDL)  as  a  tool  for  model¬ 
ing  the  detailed  design  of  a  VHSIC  class  chip.  This  analysis  will  be  accomplished  by 
structurally  decomposing  the  16- point  processor  into  its  constituent  processing  elements 
and  modeling  the  primary  CMOS  circuits  which  make  up  those  processing  elements.  In 
addition,  the  arithmetic  and  control  sections  of  the  architecture  will  be  modeled  at  the 
bit  level  using  the  C  programming  language.  This  will  serve  to  verify  the  correctness  of 
the  architecture  and  the  circuit  design. 

The  modeling  of  the  architecture  has  been  accomplished  by  a  structural  decomposi¬ 
tion  of  the  system  into  subsystems,  components  of  those  subsystems,  macro  cells  and 
finally  the  microcells  that  make  up  the  cells.  Decomposing  the  architecture  led  to  the 
definition  of  the  hardware  interfaces.  This  top-down  interface  definition  imposed  a  sig¬ 
nal  flow  structure  on  the  system  that  was  followed  by  the  definition  of  the  internal  circu¬ 
itry.  Once  the  chip  was  decomposed  into  its  smallest  individual  logic  components,  the 
micro  and  macrocells  were  modeled  using  VHDL  library  descriptions  as  well  as  user 
defined  descriptions.  In  this  fashion  the  system  could  be  reconstructed  following  the  pre¬ 
viously  defined  interfaces. 

Subgoals  of  the  modeling  process  were  to  establish  functional  equivalency  between 
the  simulation  program  and  the  actual  hardware,  development  of  test  cases  to  simulate 
various  data  sets,  and  development  of  test  vectors  for  use  in  future  VHDL  simulations 
and  hardware  testing. 

1.8.  Sequence  of  Presentation 

Chapter  2  reports  on  the  development  of  the  architecture  of  a  signal  processor 
based  on  the  Winograd  algorithm.  Details  on  the  VVinograd  Transform,  the  Good- 


Thomas  prime  factor  algorithm,  the  Chinese  Remainder  Theorem  and  their  implications 
for  the  system  architecture  are  included  in  this  chapter. 


Chapter  3  presents  the  VHDL  constructs  used  to  model  hardware.  A  hardware 
entity  is  described  as  it  is  used  in  the  VHDL.  Examples  will  be  used  to  illustrate  a 
methodology  to  be  followed  when  modeling  circuits. 

Chapter  4  details  the  modeling  of  the  16-point  processor  using  VHDL.  The  16  point 
processor  is  completely  decomposed  into  the  smallest  independent  circuits,  inverters  and 
transmission  gates,  which  are  then  used  to  construct  the  primary  cells.  The  VHDL 
descriptions  and  modeling  of  the  major  cells  are  presented. 

Chapter  5  presents  the  C  simulation  used  to  verify  the  16-point  architecture.  A 
discussion  of  the  need  for  system  simulation  is  presented,  followed  by  a  description  of 
the  general  approach  used  in  program  development. 

Chapter  6  is  an  analysis  of  the  utility  of  the  VHDL  as  a  VLSI  design  tool.  Recom¬ 
mendations  for  applications  of  the  C  simulations  are  presented.  Finally  the  recommen¬ 
dations  and  conclusions  based  on  the  research  performed  while  carrying  out  this  thesis 
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CHAPTER  2 


Development  of  the  WFTA  Architecture 


2.1.  Overview 

As  stated  in  the  first  chapter,  the  Winograd  DFT  algorithm  is  a  computationally 
efficient  method  of  computing  the  discrete  Fourier  transform.  It  is  of  interest  in  VLSI 
because  the  matrix  form  of  the  algorithm  maps  very  efficiently,  in  terms  of  space,  and 
regularity  of  structure,  into  a  signal  processing  architecture.  In  addition,  by  combining 
various  Winograd  modules  into  a  pipelined  architecture  in  the  manner  specified  by  the 
Good-Thomas  Prime  Factor  algorithm,  large  data  blocklengths  may  be  computed.  The 
development  of  an  architecture  based  on  these  algorithms  is  discussed  in  this  chapter. 
The  approach  will  be  to  introduce  the  Fourier  transform,  how  it  is  used  signal  processing 
applications,  and  then  demonstrate  how  a  more  efficient  implementation  of  the  basic 
Fourier  transform  leads  to  the  architecture  modeled  in  this  thesis.  A  4080  point  block 
length  is  initially  assumed  and  later  justified  in  section  2.2.  Concepts  which  will  be 
introduced  in  this  section  include  the  Good-Thomas  Prime  Factor  algorithm,  the  Chinese 
Remainder  theorem,  the  Winograd  Fast  Fourier  Transform  algorithm  and  cyclic  convolu- 


2.2.  Fourier  Series  Representation 

Most  signals  of  interest  in  communications  or  signal  processing  applications  can  be 
described  as  a  function  of  time  by  the  equation: 


f(t)  =  A  sin  (wt  +  d>) 

where  A  is  the  signal  Amplitude. 

<t>  is  the  signal  Phase. 

« jj  is  the  frequency  in  radians/sec. 


Signals  which  conform  to  the  relaxed  Dirchilet  conditions  [15]: 


1.  f(t)  has  only  finite  number  of  maxima  and  minima  in  the  interval. 

2.  f(t)  has  only  a  finite  number  of  discontinuities  in  the  interval. 

3.  f(t)  satisfies  the  inequality: 

T 


/ 


f(t) 


’dt  <  oo 


o 


(2-2) 


may  be  represented  by  the  Fourier  series  which  is  defined  as: 


f(t) 


n  —  oc 


E 


p  e(jnwt) 


n=  -oo 


where: 

Fn  is  a  complex  coefficient  representing 
the  initial  phase  angle  and  magnitude 

the  exponential,  \ 

represents  phasor  rotation  at  angular  frequency  ui. 


(2-3) 


By  summing  the  phasors,  e  ,  over  the  index  the  instantaneous  amplitude  and  phase  of 
the  original  signal  can  be  determined.  In  addition,  the  Fourier  coefficients,  F  ,  can  be 
summed  to  find  the  average  signal  power.  A  plot  of  the  Fourier  coefficients  versus  fre¬ 
quency  is  known  as  the  spectrum  of  the  signal.  Characteristics  of  the  spectrum  are  (1) 
that  its  envelope  is  dependent  on  the  pulse  shape,  and  (2)  there  is  an  inverse  relationship 
between  pulse  width  and  frequency  spread. 


The  Fourier  transform,  used  to  calculate  the  Fourier  coefficients,  is  defined  by  the 
equation: 


T 

o 

F(u»)  =  /  f(t)e_()U't)dt  (2-4) 

-T 

o 

Using  the  Fourier  transform  we  can  describe  any  signal  of  the  form  (2-1)  in  terms  of  a 
spectral  density  function  of  the  form  (2--1). 
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2.3.  Fast  Fourier  Transforms 


Many  of  the  problems  in  digital  signal  processing  involve  computation  of  the 
Discrete  Fourier  Transform  (DFT)  for  finite  input  sequences  of  real  or  complex  data 

points.  The  DFT  of  a  complex  data  sequence  v  is  given  by: 

i=n-Z 

Vk  =  £  <*>\  (2-5) 

i=0 


where: 

n  is  the  blocklength  of  the  data  sequence 

...  .  .  -02ir/N) 

u;  is  the  complex  phasor  e 

v  is  a  vector  of  complex  numbers. 


The  DFT  can  be  computationally  expensive.  The  number  of  complex  additions  and  mul- 
2 

tiplications  is  0(N  ).  For  example,  a  direct  implementation  of  a  4080  point  DFT  will 
require  16,646,400  multiplications  and  16,642,320  additions.  The  body  of  theory  labeled 
Fast  Fourier  Transforms  is  concerned  with  manipulation  of  input  and  output  data 
indices  in  order  to  achieve  a  more  efficient  means  of  performing  this  DFT  operation. 
FFT  algorithms  generally  use  a  variety  of  methods  to  shufTle  elements  around  in  the 
data  matrices  to  reduce  the  number  of  multiplications  required.  Figures  of  merit  for 
FFT  algorithms  revolve  around  the  numbers  of  additions  and  multiplications,  with 
replacement  of  multiplications  by  additions  being  the  preferred  approach  to  achieve  a 
more  efficient  algorithmic  implementation.  Fast  multipliers  are  costly  in  terms  of  silicon 
area  and  processing  time.  Reduction  of  multiplications  in  favor  of  additions  reduces  the 
space  requirements  of  the  multiplier  section  and  decreases  latency  through  the  pipeline. 
Additional  space  freed  up  can  then  be  used  to  allow  a  smaller  die  size,  resulting  in 
greater  yield,  or  to  implement  desirable  features  such  as  error  detection,  correction,  and 
other  fault  tolerance  measures. 

The  Winograd  Fourier  Transform  Algorithm  (WFTA)  architecture  was  developed 
using  both  the  Winograd  and  Good-Thomas  Prime  Factor  algorithms.  The  Good- 
Thomas  algorithm  is  used  to  break  the  4080  point  blocklength  into  mutually  prime 


sequences  of  length  15,  16,  and  17.  These  smaller  blocklengths  are  computed  using  the 
Winograd  FFT  algorithms.  Combining  the  Good-Thomas  Prime  Factor  Algorithm  (PFA) 
and  the  WFTA  in  this  fashion  will  reduce  the  number  of  operations  to  31,148  multiplica¬ 
tions  and  157,164  additions  [17].  This  represents  a  reduction  in  the  number  of  multipli¬ 
cations  by  a  factor  of  over  500.  We  now  wish  to  examine  the  theory  which  allows  us  to 
decompose  the  4080  point  DFT  in  order  to  achieve  these  reductions. 


2.3.1.  Good-Thomas  Prime  Factor  Algorithm.  The  Good-Thomas  PFA  allows  the 
representation  of  a  linear  array  of  n  data  points  as  an  m-dimensional  array  in  such  a 
manner  as  to  allow  calculation  of  a  sequence  of  true  m-dimensional  Fourier  Transforms. 
The  CRT  is  used  to  map  the  sequential  data  addresses  onto  a  unique  location  in  a  Tri¬ 
dimensional  hypercube.  In  order  to  use  the  CRT  the  decomposition  factors,  ml,  m2,  and 
m3  must  be  relatively  prime  (sharing  no  common  factors).  Considerations  for  selection 
of  a  WFTA  block  length  were  computational  efficiency  of  the  pipeline  and  adaptability 
to  existing  signal  processing  systems. 


Pipelined  architectures  achieve  maximum  efficiency  when  all  processors  require 
approximately  the  same  time  to  compute  each  problem.  Current  radar  systems  use  4096 
point  scans  for  signal  processing,  but  may  be  adapted  for  other  block  sizes.  For  these 
reasons  the  decomposition  factors  ml  =  15,  m2  =  16,  m3  =  17  [8]  were  chosen.  This 
balances  the  processing  delay  through  all  stages  in  the  pipeline.  The  product  of  the 
decomposition  factors  ml  X  m2  X  m3  equals  4080.  This  can  be  thought  of  as  map¬ 
ping  the  4080  data  points  into  a  cubic  data  structure  with  sides  of  length  15,  16.  and  17. 
The  sides  of  the  cube  are  the  block  lengths  of  the  decomposed  DFT.  The  entire  DFT 
can  then  be  computed  by  piping  the  output  of  one  stage  into  the  input  of  the  next. 
Using  the  PFA  we  can  rewrite  the  4080  point  DFT  originally  given  its: 
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into  the  following  form  by  remapping  the  input  and  output  data  indices  using  the  CRT. 

239  271  254 

i,k  i2k  i3k 

Vk  =  [  £[  El  ]«  ]  (2-7) 

il=0  i2=0  i3=0 

Now  instead  of  doing  one  4080  point  transform,  we  are  doing  a  16  point  DFT  (15)(17) 
times,  a  15  point  DFT  (1 6)(  17)  times  and  a  17  point  DFT  (15)(16)  times.  Taylor  per¬ 
formed  a  numerical  simulation  of  the  4080-point  pipeline  and  the  results  showed  that  the 
best  ordering  of  the  DFT  modules  would  be  as  shown  above.  The  16-point  FFT  has  the 
best  numerical  performance,  while  the  17-point  shows  the  worst.  Ordering  the  pipeline 
in  this  fashion  will  minimize  truncation  and  rounding  noise  [17]. 

The  combined  effect  of  the  PFA  and  the  CRT  is  shown  figuratively  in  Figure  2-1. 
The  CRT  maps  each  element  of  the  4080  point  data  sequence  into  a  unique  address  on  a 
15  x  16  x  17  cube.  The  4080  point  DFT  is  then  computed  as  a  sequence  of  three  2-D 
DFTs.  For  example,  the  15  point  DFT  can  be  visualized  as  an  array  (16,17)  of  columns 
with  15  elements  per  column.  This  is  represented  by  the  XZ  plane  in  Figure  2-1.  Com¬ 
plete  computation  of  the  DFT  will  require  computation  of  a  DFT  for  each  of  the  surface 
planes  of  the  cube.  The  summation  notation  in  equation  (2-8)  above  reflects  the  DFT 
being  computed  and  the  number  of  iterations  through  the  data  set  that  are  required. 

Thus  computation  of  the  DFT  is  performed  in  a  pipelined  implementation  as  fol¬ 
lows: 

a) ,  computation  of  all  columns  perpendicular  to  the  XZ  plane,  map  the 

outputs  via  the  CRT  into  new  location  on  the  cube.  (16  point  DFT). 

b) .  computation  of  all  columns  perpendicular  to  the  YZ  plane,  map  the 

outputs  via  the  CRT  into  new  location  on  the  cube.  (15  point  DFT). 

c) .  computation  of  all  columns  perpendicular  to  the  XY  plane,  map  the 

outputs  via  the  CRT  into  new  location  on  the  cube.  (17  point  DFT). 

This  conceptualization  leads  directly  into  the  pipelined  architecture  of  the  1080  point 
DFT  processor,  shown  in  Figure  2-2.  In  hardware,  the  cube  is  a  memory  element  of  1080 
words  with  data  addresses  determined  by  the  CRT  and  the  array  of  columns  represents 
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Figure  2-1.  Cubic  Data  Structure  of  the  4080  Point  PFA  Implementation 


a  15,  16,  or  17  point  WFTA  processor  element.  Dual  4080  word  memories  are  used 
between  each  WFTA  element  in  order  to  allow  each  element  exclusive  access  to  a  4080 
word  data  cube.  After  each  element  completes  a  scan  through  the  data  set  the  results 
are  sent  to  the  next  memory  element  in  the  pipeline. 

2.3.2.  Winograd  Fast  Fourier  Transform.  Dr.  Shmuel  Winograd  first  introduced  the 
Winograd  Fast  Fourier  Transforms  in  1975  [18].  Some  of  the  characteristics  of  these 
algorithms  are  that  the  number  of  multiplications  is  nearly  O(N)  while  the  number  of 
additions  remain  in  the  neighborhood  of  those  required  for  other  FFT  algorithms. 
Winograd’s  algorithms  are  used  to  compute  each  15,  16,  and  17  point  DFT.  The  small 
algorithms  treat  three  cases  of  block  size: 


Figure  2-2.  4080-Point  WFTA  Pipeline  Implementation 

1.  Blocklength  a  prime. 

2.  Blocklength  a  power  of  a  prime. 

3.  Blocklength  a  power  of  two. 

Cases  one  and  t*  ee,  respectively,  will  be  used  to  compute  the  17  and  16  point  DFT. 
The  15  point  DFT  does  not  fall  under  any  of  the  cases  listed  above.  In  order  to  compute 
this  DFT,  and  other  blocklengths  which  are  not  one  of  the  cases  listed  above, 
Winograd’s  large  algorithm  must  be  used.  The  large  algorithm  combines  smaller  block- 
lengths,  which  can  be  computed  using  the  small  algorithm,  into  a  larger  DFT  module. 
In  the  case  of  the  15-point  module  it  may  be  computed  using  blocklengths  of  sizes  three 
and  five,  which  are  both  case  two. 
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2.4.  WFTA  Processor  Architecture 

The  16-point  DFT,  shown  in  (2-6),  is  computed  using  case  three,  blocklength  a 
power  of  two. 

1  =  15 

V—  E  (2-8) 

1=0 

Convolution  theory  allows  the  DFT  of  a  data  sequence  to  be  written  as  a  cyclic  convolu¬ 
tion.  Using  the  procedure  for  the  second  case,  the  16  data  points  are  partitioned  into  sets 
of  even  and  odd  indices.  The  eight  odd  indices  are  arranged  into  a  set  of  four  cyclic  con¬ 
volutions,  while  the  eight  even  indices  form  an  eight-point  DFT,  again  a  power  of  two. 
This  partitioning  process  continues  until  the  resulting  DFT  is  composed  of  only  two 
points  which  may  be  then  directly  converted  into  a  cyclic  convolution.  The  theoretical 
aspects  of  this  process  are  covered  in  more  detail  in  [l],  [17].  The  basic  principle  involved 
is  that  the  DFT  may  be  converted  to  a  series  of  cyclic  convolutions  using  the  Winograd 
Algorithm.  The  rationale  behind  the  conversion  to  a  convolution  is  that  the  convolution 
may  be  calculated  more  efficiently  using  a  fast  convolution  algorithm  such  as  the  Wino¬ 
grad  Fast  Convolution  Algorithm. 

The  form  of  a  cyclic  convolution  : 

s(x)  =  g(x)d(x)  mod  [  m(x)  ]  (2-9) 

where  d(x)  is  the  data  sequence. 

g(x)  is  the  coefficient  sequence. 
m(x)  is  a  fixed  polynomial  arising  out 
of  the  partioning  process. 

Through  an  application  of  the  Chineses  Remainder  Theorem  for  polynomials  and  some 
manipulations  shown  in  detail  in  1  17],  (2-9)  may  be  converted  into  the  form: 

X  =  CDAi  (2-10) 

C  is  an  incidence  matrix  of  preadditions. 

D  is  a  diagonal  matrix  of  coefficients. 

A  is  an  incidence  matrix  of  postadditions. 


The  coefficient  sequence  is  a  diagonal  matrix  with  constant  either  real  or  imaginary 
terms,  the  dimension  of  which  is  equal  to  the  number  of  multiplications  to  be  performed. 
The  architecture  that  implements  this  equation  in  hardware  is  shown  in  Figure  2-3.  The 
structure  exploits  the  fact  that  the  data  is  not  complex  until  the  postaddition  matrix, 
where  the  paths  merge  in  the  final  postaddition  operation.  This  allows  the  architecture 
which  implements  the  preaddition  and  multiplication  operations  to  be  separate  real  and 
imaginary  parts.  The  arithmetic  operations  are  performed  using  serial  hardware  to 
reduce  routing  space  and  complexity.  However  the  I/O  paths  are  word  parallel  in  order 
to  lessen  the  memory  access  time  constraint.  Additional  structures  needed  are  a  control 
sequencer  to  generate  control  signals,  and  a  ROM  to  store  the  data  addresses  in  the 
order  specified  by  the  Chinese  Remainder  Theorem. 

Winograd’s  large  algorithm  could  have  been  used  to  compute  the  entire  4080  point 
DFT  by  nesting  the  15,  16,  and  17  point  using  the  Winograd  Large  Algorithm,  as  in  the 
case  of  the  15-point  DFT.  However,  the  size  of  the  multiplication  matrix  limits  the  abil¬ 
ity  to  embed  an  entire  processor  on  a  single  silicon  chip.  For  example,  the  4080  point 
DFT  would  require  a  multiplication  matrix  over  23,000  serial  multipliers  tall  [17].  A 
more  modular  implementation  which  is  more  suitable  for  VLSI  implementation  using 
state-of-the-art  fabrication  technology  uses  the  Winograd  modules  to  compute  the  15-, 
16-,  and  17-point  DFTs  and  the  Good-Thomas  Prime  Factor  Algorithm  (PFA)  to  com¬ 
pute  the  entire  transform.  This  implementation  requires  more  operations  but  is  more 
area  efficient  and  lends  itself  to  a  pipelined  implementation  yielding  greater  computa¬ 
tional  throughput. 
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CHAPTER  3 


VHSIC  Hardware  Description  Language 

3.1.  Overview 

The  Winograd  Fourier  Transform  (WFT16)  processor  presented  in  the  last  chapter 
is  a  complex  circuit  consisting  of  over  100,000  transistors.  Complete  comprehension  of 
the  function  of  every  transistor  would  be  impossible  if  the  circuit  was  considered  as  a 
monolithic  entity.  An  alternative  approach  might  be  to  try  and  understand  the  function 
of  the  major  components,  and  how  they  interact  with  the  rest  of  the  circuit.  Detailed 
understanding  would  come  by  continually  repeating  this  process,  each  time  at  a  lower 
level,  until  the  entire  processor  can  be  visualized  as  a  grouping  of  simple  circuits 
interacting  in  a  complex  manner.  This  is  what  VHDL  modeling  is  all  about.  ”VHDL  is  a 
language  which  can  be  used  to  describe  hardware,  ranging  from  simple  logic  gates  to 
complex  digital  systems”  [5].  The  VHDL  allows  a  circuit  behavior  to  be  described  at  a 
convenient  level  of  understanding  {or  abstraction);  more  detail  may  be  observed  by  step¬ 
ping  down  one  level  in  the  hierarchy  and  describing  the  behavior  of  the  components  and 
the  interactions  which  together  create  the  larger  behavior. 

This  chapter  will  present  VHDL  in  the  following  context.  A  VLSI  circuit  of  almost 
29,000  transistors  will  be  described,  and  its  behavior  modeled  using  the  hierarchical  pro¬ 
cedure  described  above.  Along  the  way,  the  syntax  of  VHDL  will  be  presented  as  a  tool 
useful  in  describing  circuit  behavior.  The  VHDL  structures  used  for  representation  of  a 
physical  device  will  also  be  addressed.  These  include  entities  and  bodies.  This  descrip¬ 
tion  will  be  followed  with  a  representation  of  the  constructs  used  to  model  data 
transforms  such  as  sequential  and  concurrent  signal  assignment  statements  and  bus  reso¬ 
lution  functions.  Finally,  a  complete  example  of  VHDL  modeling  of  a  CMOS  latch  will 


be  given. 


3.2.  VHDL  Modeling  of  a  Large  Circuit 

The  output  structure  of  the  16-point  WFTA  (WFT16)  processor  is  a  serial  in, 
parallel  out  (SIPO)  shift  register.  Every  clock  cycle  one  bit  from  each  of  32  serial  input 
vectors  enters  the  register,  and  every  other  clock  cycle  a  forty-eight  bit  vector  is  output 
in  parallel  to  the  data  bus.  This  process  is  controlled  by  three  signals.  Thus,  at  the 
highest  level  of  abstraction,  the  SIPO  may  be  viewed  as  a  black  box  which  receives 
inputs  and  produces  an  output.  By  itself,  this  description  does  not  impart  very  much 
information  about  how  the  SIPO  operates.  Referring  to  the  system  block  diagram  reveals 
that  the  thirty-two  bit  input  is  actually  made  up  of  two-sixteen  bit  vectors,  and  the  out¬ 
put  is  two  twenty-four  bit  words.  This  allows  a  second,  lower  level  of  behavior  to  be 
visualized:  The  SIPO  is  really  composed  of  two  smaller  identical  shift  registers.  Con¬ 
tinuing  in  this  fashion  each  register  is  found  to  consist  of  sixteen  identical  rows,  each 
row  made  up  of  twenty-four  identical  cells.  If  the  behavior  of  this  one  cell  can  be  under¬ 
stood,  it  is  easy  to  visualize  the  operation  of  the  entire  29,000  transistor  register.  The 
decomposition  of  the  SIPO  behavior  from  one  register  into  a  cell  is  shown  in  Figure  3-1. 
Although  not  all  circuits  are  an  array  of  identical  cells,  most  behaviors  may  be  decom¬ 
posed  into  a  lower  level  of  abstraction,  which  can  then  be  more  easily  understood  and 
modeled. 

3.3.  VHDL  Modeling  Structures 

There  are  three  independent  units  in  VHDL:  packages,  entities,  and  bodies.  A 
VHDL  description  of  a  piece  of  hardware  consists  of  the  interface  (which  is  called  an 
entity  in  the  VHDL  syntax)  and  an  architectural  description  of  how  the  device 
transforms  inputs  to  outputs  (a  body  in  the  syntax).  Related  type  declarations,  func¬ 
tions,  and  procedures  can  be  grouped  into  a  package  and  made  available  to  the  interface. 
VHDL  defines  two  different  types  of  information  channels.  Ports  are  the  wires  used  to 
interconnect  entities,  while  signals  are  used  to  carry  information  internal  to  a  design 
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Figure  3-1.  SIPO  Decomposition 

entity.  Furthermore,  ports  are  declared  in  interface  entities,  while  signals  are  declared  in 
bodies. 

3.3.1.  Packages.  A  package  is  an  Ada-derived  structure  used  to  group  logically 
related  items  so  they  can  be  referenced  by  a  group  of  related  design  entities.  Items 
which  may  be  inside  a  package  are  type  declarations,  attribute  declarations  and 
specifications,  constants,  alias  declarations,  functions,  and  procedures.  The  contents  of  a 
package  are  made  visible  to  interface  declarations  with  a  context  clause: 

for  SOME_PACKAGE;  use  SOME^PACKAGE; 

at  the  header  of  the  entity  declaration. 

There  are  two  kinds  of  types  in  VHDL,  scalar  and  composite.  Scalar  types  are 


single-valued  such  as: 


a) .  Integer  types 

b) .  Floating  point  types 

c) .  Enumeration  Types 

d) .  Physical  types. 

Enumeration  types  are  declared  by  listing  the  values  which  objects  of  that  type  may 
have.  A  bit,  which  may  take  on  the  values  ’O’  or  T’,  and  boolean  arguments  which  have 
the  values  of  either  ’true’  or  ’false’,  are  examples  of  predefined  enumeration  types.  Phy¬ 
sical  types  represent  physical  parameters  such  as  time,  voltage,  current,  and  so  forth. 
Mathematical  operations  are  defined  on  physical  types.  Composite  types  represent  an 
array  of  values.  Composites  may  be  only  of  one  type  (such  as  an  array  of  bit),  or 
different  types,  such  as  a  record  listing  the  voltage  and  current  requirements  of  a  circuit. 

VHDL  also  permits  user-defined  types.  One  example  would  define  OPCODE  as  an 
array  of  eight  bits.  Then  CPU  instructions  could  be  declared  as  being  of  type 
OPCODE.  As  another  example,  an  enumeration  type,  TRIJSTATE  could  be  defined 
with  a  set  of  values  (  ’O’,  ’1’,  ’Z’).  However,  functions  and  operations  on  objects  of  user- 
defined  types  would  have  to  be  defined. 

The  SIPO  would  have  various  data  types  associated  with  the  decomposition  shown 
in  Figure  3-1.  The  input  and  output  at  each  level  exhibit  a  certain  word  length  which 
may  be  declared  as  a  bit  vector,  a  tristate  data  type  is  also  needed,  and  the  control  and 
clock  signals  may  form  additional  data  types.  Thus,  a  SIPOJPACIvAGE  would  contain 
the  following  declarations: 

Package  SIPO_PACKAGE  is 

type  16_bit_vector  is  bit_vector  (15  down  to  0); 
type  24_bit_vector  is  bit_vector  (23  down  to  0); 
type  clk_signal  is  bit; 

type  z _ bit  is  (’O’,  T,  ’Z’); 

type  control  is  bit; 

end  SIPO_PACKAGE; 

VHDL  is  a  strongly  typed  language.  Although  control  and  clk_signal  are  both  of  type 
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bit,  the  compiler  will  flag  assignments  of  clk_signals  to  signals  declared  of  type  bit,  and 
vice  versa. 

3.3.2.  Interface  Declarations.  The  interface  declaration  defines  the  data  paths  (ports) 
over  which  information  flows  to  and  from  a  device.  It  consists  of  a  list  of  port  declara¬ 
tions  and  the  direction  and  type  of  information  which  may  flow  through  each  port.  To 
describe  each  level  of  decomposition  of  the  SIPO  shown  in  Figure  3-1,  an  interface 
description  may  be  written.  This  description  would  contain  a  listing  of  all  the  inputs 
and  outputs  of  the  circuits.  There  is  generally  only  one  interface  declared  per  design 
entity,  but  different  implementations  may  reference  the  same  interface.  Items  common  to 
all  bodies  of  that  interface  may  also  be  included  in  the  entity. 

There  are  five  port  modes  which  are  used  to  describe  information  flow  across  the 
interface  boundary.  Mode  in  is  used  for  data  entering  a  device  from  an  external  source. 
In  ports  may  only  be  referenced,  not  changed,  within  that  entity.  However,  in  ports  may 
be  given  a  tie-off  value  in  the  interface  declaration  for  use  if  that  port  is  not  connected 
to  an  external  driver  during  a  simulation.  Mode  out  is  used  for  data  originating  within  a 
device  for  use  in  some  external  circuit.  Its  value,  representing  the  result  of  an  internal 
data  transform,  may  not  be  used  within  the  originating  device.  Mode  inout  is  a  bidirec¬ 
tional  port  which  allows  the  port  to  be  externally  or  internally  driven  (as  in  a  system 
bus).  Mode  buffer  allows  the  port  to  be  referenced  (read)  by  components  both  inside  and 
outside  the  device  boundary.  However,  it  must  be  driven  by  a  source  within  the  entity 
defined  by  the  interface.  An  example  of  buffer  mode  is  the  feedback  inverter  of  a  static 
latch.  The  mode  linkage  is  used  for  ports  whose  direction  of  travel  is  unknown.  This 
port  mode  is  used  only  to  pass  information  down  to  lower  levels  of  the  hierarchy.  It  can¬ 
not  be  either  referenced  or  altered.  Table  3-1  summarizes  the  port  modes  and  allowed 


operations. 


Port  Mode 


in  out  inout  buffer  linkage 


SERIALJN  and  PARALLEL_IN  are  the  input  data  bits,  SERIAL_OUT  and 
PARALLEL_OUT  are  the  output  bits.  SR_SIPO,  SD_SIPO,  and  LATCH_SIPO  are  con¬ 
trol  signals.  Circuit  operation  is  under  the  control  of  a  two-phase  clock.  The  interface 
declaration  for  this  cell  is  shown  in  Figure  3-3. 

This  example  illustrates  the  use  of  port  modes  and  user  defined  types.  Types  control  and 
clk_signal  are  defined  in  SIPO_PACKAGE.  The  in  ports  are  given  tie-off  values  which 
will  be  used  if  the  port  is  not  connected  in  some  simulation  model.  It  is  important  to 
note  what  is  happening  at  node  A  in  Figure  3-2.  First,  the  node  is  being  driven  by  both 
the  input  port  PARALLELJN,  and  the  output  port,  SERIAL_OUT.  Any  node  driven 
by  more  than  one  independent  source  is  termed  a  ’’bus”  in  VHDL  syntax.  Busses  must 
be  declared  by  type  and  have  an  associated  bus  resolution  function,  which  will  determine 
how  the  several  source  values  will  be  resolved  to  arrive  at  a  signal  value.  Bus  resolution 
functions  will  be  discussed  in  section  3.5.3.  SERIAL_OUT  is  also  used  to  drive  internal 
and  external  nodes,  this  must  be  reflected  in  the  port  mode.  Both  inout  and  buffer  modes 
could  be  used.  In  this  case,  mode  buffer  was  chosen  to  reflect  that  node  A  should  not  be 
driven  from  an  external  source.  The  assertion  construct  reflects  the  design  intent  that 


with  SIPOJPACKAGE;  use  SIPO_PACKAGE; 
entity  SIPO_CELL 

(SERIALJN,  PARALLELJN:  in  bit  :=  ’O’; 

SRJSIPO,  SDJSIPO,  LATCHJSIPO:  in  control; 

CLK2,  CLK2_NOT,  CLKl,  CLKIJ^OT:  in  clk_signa!  :=  'O’; 
SERIAL_OUT;  buffer  bit; 

PARALLEL_OUT:  out  bit)  is 

assert  (not(LATCHJSIPO  and  SDJSIPO)) 
report  ”LATCH_SIPO  AND  SDJIIPO  are  both  set” 
severity  fatal; 

end  SIPO_CELL; 


Figure  3-3.  Interface  Declaration 
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only  one  of  the  two  drivers  of  node  A  should  be  active  during  any  simulation  cycle.  If 
this  assertion  is  violated  the  simulator  will  report  the  error  message  shown.  The  severity 
level  is  referenced  at  simulation  time,  along  with  a  user  specified  error  threshold,  to 
determine  which  errors  encountered  should  cause  termination  of  the  run. 

3.3.3.  Bodies.  Many  different  approaches  may  be  used  to  implement  a  given  function. 
In  general,  however,  the  inputs  and  outputs  to  the  function  are  fairly  well  defined  early 
in  the  design  cycle.  VHDL  supports  design  flexibility  by  allowing  multiple  architectural 
bodies  to  be  written  for  a  given  interface.  Each  architectural  body  may  be  tested  by 
linking  the  interface  description  to  the  body  using  a  configuration  block  statement  or 
configuration  body. 

The  body  defines  how  the  device  actually  operates.  There  are  two  types  of  bodies, 
architectural  and  configuration.  An  architectural  body  describes  the  behavioral  and/or 
structural  characteristics  of  a  particular  implementation.  A  configuration  body  defines 
how  a  particular  instantiation  of  an  design  entity  is  to  be  implemented.  The 
configuration  body  is  the  linkage  between  the  entity  declared  in  the  component  declara¬ 
tion,  the  specific  architectural  body  the  designer  wishes  to  use,  and  the  component 
instantiation. 

3.3.3. 1.  Architectural  Bodies.  The  architectural  body  may  consist  of  different  levels 
which  describe  the  operation  of  a  device.  A  purely  structural  description  exclusively  uses 
component  declarations  and  instantiations  to  describe  its  operation.  On  the  other  hand, 
a  purely  behavioral  description  contains  no  component  instantiations.  All  data 
transforms  are  completely  described  using  concurrent  signal  assignment  or  process  state¬ 
ments  inside  the  architecture.  VHDL  allows  any  combination  of  these  two  extremes  to 
be  used  to  model  a  device.  The  basic  structure  of  an  architectural  body  is  shown  in  Fig- 


architecture  NAMEjOFJBODY  of  NAME_OF_JNTERF ACE_DECLARATION  is 


block_name:  block  (boolean  guard  statement) 

declarative  section: 
component  declaration; 
component  configuration; 
local  signal  declaration; 

begin 

concurrent  statements 
processes 

nested  block_name; 
end  name_of_body; 

Figure  3-4.  Architectural  Body  Template 


Note  that  these  statements  and  declarations  could  be  in  any  order  within  their  respec¬ 
tive  regions. 

The  nature  of  digital  systems  is  that  their  operation  is  made  up  of  multitudes  of 
circuits,  all  operating  in  parallel.  The  WFT16  processor  is  a  pipelined,  bit-serial  proces¬ 
sor  which  is  designed  to  operate  at  high  speeds,  as  such,  there  are  few  logic  stages 
between  clocked  elements.  Thus,  the  circuit  could  be  roughly  partitioned  into  just  two 
sets  of  parallel  operations:  those  which  occur  on  the  <f>2  pulse  and  those  which  occur  on 
the  <t>l  pulse.  The  block  structure  which  makes  up  the  architectural  body  shown  in  Fig¬ 
ure  3-4  is  the  structure  that  VHDL  uses  to  represent  parallel  events.  All  the  statements 
contained  within  the  block  execute  concurrently  and  may  be  controlled  by  the  boolean 
guard.  The  guard  expression  specifies  a  condition  which  must  be  true  before  statements 
within  the  block  which  reference  the  guard  can  execute.  This  will  be  discussed  further  in 
the  sections  on  signal  assignment  statements. 


Before  a  lower  level  VHDL  description  can  be  referenced  in  a  higher  level  descrip¬ 
tion  it  must  be  declared.  This  is  done  by  listing  the  interface  entity  name,  followed  by  a 

listing  of  its  port  names,  data  types,  and  modes.  A  component  declaration  makes  avail¬ 
able  a  local  copy  of  an  interface  to  a  body.  To  describe  the  behavior  of  the  SIPO_CELL 

in  terms  of  the  behavior  of  its  components,  those  components  would  first  have  to  be 

declared  as  shown  in  Figure  3-5. 

component  MSFF 

port(A:  in  Z _ bit; 

CLK2,  CLK2_NOT,  CLKl,  CLKl^OT:  in  clk_signal; 

B:  buffer  bit); 

component  T_GATE: 
port  (X:  in  Z_bit; 

CLK:  in  clk_signal; 

Y:  out  bit); 

Figure  3-5.  Component  Declaration  Statement 

Note  the  type  clash  between  the  input  and  outputs  of  the  MSFF  and  T_GATE.  The 
MSFF  produces  an  output  of  type  bit,  and  the  T-GATE  expects  its  inputs  to  be  of  type 

Z _ bit.  Since  these  are  connected  as  per  Figure  3-2,  a  type  conversion  function  must  be 

used  to  convert  the  output  of  the  MSFF  into  the  type  that  the  T_GATE  expects. 

A  component  instantiation  statement  fits  a  declared  component  into  the  framework 
of  the  design.  This  is  done  by  an  interconnection  of  the  ports  of  the  instantiated  com¬ 
ponent  with  ports  declared  within  the  interface  and  locally  defined  signals.  A  component 
instantiation  is  a  concurrent  construct  and  will  be  further  discussed  in  section  3.5.2. 

Information  transfer  within  an  architecture  takes  place  using  signals  and  ports. 
Ports,  which  are  listed  in  the  interface  declaration  for  the  register,  are  connected  to  the 
port  with  that  same  name  in  the  component  instantiation  statement  or  signal 


assignment  statements.  Signals  are  declared  by  name,  by  type,  and  by  the  reserved 
word  atomic  that  indicates  that  multiple  drivers  are  defined  for  that  signal.  As  stated 
earlier,  signals  with  multiple  drivers  are  called  busses.  Signals  are  declared  atomic  fol¬ 
lowed  by  the  name  of  a  bus  resolution  function.  Atomic  is  a  flag  indicating  that  multi¬ 
ple  drivers  are  associated  with  a  signal,  and  the  name  of  the  function  to  be  used  to 
resolve  the  drivers  into  an  output  value. 

3.3.3. 2.  Configuration  Blocks  and  Bodies.  Since  interface  declarations  can  be  associ¬ 
ated  with  more  than  one  implementation  (body),  it  is  necessary  to  identify  which  body  is 
being  declared.  Identification  of  a  component  with  a  specific  body  can  be  done  with  a 
configuration  specification  within  the  body  or  by  a  separate  configuration  body.  The 
disadvantage  of  placing  the  configuration  within  an  architectural  body  is  that  the  archi¬ 
tecture  is  now  specifically  associated  with  one  design.  Flexibility  to  instantiate  different 
components  is  lost  unless  the  code  is  edited  and  recompiled.  The  more  flexible  approach 
would  define  a  separate  configuration  body  for  each  design.  This  would  allow  different 
configuration  bodies  to  be  written  for  different  component  instantiations  within  the  same 
architectural  framework. 

A  configuration  specification  assigns  a  specific  body  to  be  used  with  the  interface  in 
the  component  declaration.  It  may  also  specify  port  maps,  additional  ports,  and  generic 
declarations.  The  label  used  in  a  component  instantiation  statement  identifies  which 
instance  of  the  component  is  being  configured.  Figure  3-6  shows  the  configuration  of  the 
MSFF  and  T_GATE  used  in  the  SIPO_CELL  declared  in  Figure  3-5. 


for  Ml,  M2:  MSFF 
use 

entity  (MSFF) 
port  map  (bit _ in  =>  A, 

CLK2  =>  CLK2,  CLK2^0T  =>  CLK2.NOT, 

CLKl  =>  CLKl,  CLKlJVOT  =>  CLKljNOT, 
bit_out  =>B) 
body  (MIXED JBODY) 
end  for; 

for  all:  T_GATE 
use 

entity  (TjGATE) 

port  map  (bit _ in  =>  convb-z(X), 

elk  =>  elk, 
bit_out  =>  Y) 
body  (BEHAVIOR) 
end  for; 

Figure  3-6.  Configuration  Specification  for  SIPO_CELL 


Note  the  use  of  the  type  conversion  function,  convb_z(X),  in  the  port  map  for  the 
T_GATE.  Also  note  that  since  both  instantiations  use  the  same  configuration,  the 
instantiations  labels,  Ml  and  M2,  could  be  replaced  with  the  reserved  word  all  as  in  the 
T_GATE  configuration.  If  multiple  configurations  of  the  same  entity  are  involved,  all 
but  one  the  same,  the  different  one  could  be  configured  first,  as  above,  and  the  rest 
identified  with: 

—  for  others:  MSFF  use  -- 

and  configured  in  one  block. 

Entity  (MSFF)  identifies  which  interface  entity  is  used.  The  entity  entry  links  the 
component  declaration  to  an  entity  which  is  stored  in  the  VHDL  design  library.  The 
port  map  associates  the  formal  names  listed  in  the  entity  with  those  used  in  the  com¬ 
ponent  declaration:  association  is  left  to  right,  formal  =  >  actual.  The  statement  body 


(MIXED_BODY)  directs  the  use  of  the  architecture  (MIXED_BODY)  for  use  in  this 
architectural  body. 

The  configuration  body  describes  the  implementation  of  any  entities  in  an  architec¬ 
tural  body  which  are  referenced  by  component  instantiation  statements.  Design  units 
referred  to  by  the  configuration  body  must  reside  in  the  design  library.  The 
configuration  body  is  the  top  level  of  the  hierarchy  for  the  entity  listed  in  the  body 
header.  The  configuration  specification  is  the  main  item  which  constitutes  the  body. 

Configuration  of  component  instantiations  two  levels  deep  is  allowed  in  a 
configuration  body.  For  example,  if  the  SIPO_CELL  is  instantiated  to  build  a 
SIPO_REGISTER_ROW  a  configuration  body  could  configure  the  SIPOjCELL  hierar¬ 
chy  down  one  additional  level,  allowing  different  MSFF  implementations  to  be  simulated 
within  the  framework  of  the  SIPO_REGISTER. 

3.4.  Signals 

VHDL  uses  signals  to  represent  wire  interconnections  within  a  design  entity,  and 
ports  to  represent  data  channels  to  external  devices.  Input-to-output  transforms  in 
VHDL  are  represented  by  a  future  signal  value  and  a  time  when  it  will  become  valid. 
This  time/value  pair  is  called  a  transaction.  The  time  aspect  could  be  represented  by  a 
delta  delay  or  simulation  time  value.  A  delta  delay  is  an  infinitely  small  time  unit,  the 
sum  of  any  number  of  which  will  never  add  up  to  any  finite  amount  of  physical  time  (in 
terms  of  circuit  delays).  Delta  delay  is  used  to  represent  events  which  must  occur  in 
response  to  other  events  without  considering  the  nuances  of  their  timing  interaction. 
Simulation  time  represents  real  time,  and  is  used  to  simulate  timing  dependencies 
between  component  and  events. 

The  form  of  a  simple  signal  assignment  statement: 


A  <  =  B  after  Tns; 


« 

c? 


reads  ’’the  value  of  signal  B  is  assigned  to  the  driver  of  signal  A  and  will  possibly  become 
the  value  of  signal  A  after  T  nanoseconds”.  It  is  not  possible  to  affect  the  current  value 
of  a  signal  only  its  future  values.  A  simple  example  repeated  from  the  VHDL  tutorial 
will  serve  to  illustrate  this  point. 

"Consider  the  following  pairs  of  statements: 

A  :=  B  ;  X<=Y; 

B  :=  A  ;  Y  <=  X  ; 

Variable  Assignment  Signal  Assignment 

In  the  case  of  variables  A  and  B,  after  the  two  variable  statements  are  executed,  the  values  of  A 
and  B  are  identical  after  the  two  statements  are  executed.  More  interesting,  however,  is  the  fact 
that  the  values  of  X  and  Y  will  be  swapped  as  soon  as  simulation  time  advances,  because  the 
current  value  of  each  signal  has  been  scheduled  to  become  the  next  value  of  the  other  signal’s 
driver  (  after  delta  delay).”  [5] 

The  value  of  a  signal  depends  on  the  value  of  all  of  its  drivers.  (Some  devices,  such 
as  node  A  of  the  SIPOjCELL,  have  multiple  inputs  to  a  single  node.  These  multiply 
driven  nodes  are  known  as  busses).  When  a  signal  assignment  statement  is  executed  it 
inserts  a  transaction  into  a  signal  driver.  The  signal  driver  can  be  thought  of  as  a  stack 
ordered  with  respect  to  time,  time  being  the  stack  pointer.  As  time  advances  the  value 
of  the  pointer  will  become  simulation  time.  If  the  signal  has  only  one  driver,  that 
driver’s  value  will  become  the  signal  value  at  the  time  indicated  by  the  stack  pointer. 
Signals  with  multiple  drivers  have  their  values  arbitrated  via  a  bus  resolution  function 
which  is  usually  written  by  the  VHDL  programmer.  The  bus  resolution  function  is 
automatically  invoked  by  the  simulator. 

A  signal  assignment  statement  creates  a  ’’projected  output  waveform”  for  a  signal. 
Once  the  projected  output  waveform  is  put  on  the  stack,  but  before  it  becomes  a  current 
driver  value,  it  may  be  affected  by  signal  assignment  statements  which  execute  at  some 
point  in  the  future.  In  other  words,  assignment  to  a  node  with  only  one  signal  driver 
does  not  automatically  guarantee  that,  at  some  future  point  in  time,  the  value  of  the 
assignment  will  become  the  signal  value.  The  reserved  word  transport  may  appear  in  an 
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assignment  such  as: 


A  <  =  transport  B  after  T  ns; 

Transport  acts  to  delete  transactions  scheduled  for  times  later  than  the  first  scheduled 
transaction  in  the  new  waveform.  Inertial  delay,  the  default  case,  simply  deletes  all  tran¬ 
sactions  from  the  stack  which  are  scheduled  to  occur  before  the  first  event  in  the  new 
projected  waveform.  Figure  3-7  illustrates  transactions  and  drivers.  The  stack  structure 
represents  a  driver  for  the  output  of  the  combinational  logic  network.  A  projected  signal 
waveform  is  shown  which  assumes  that  this  is  the  only  driver  for  the  signal  S.  In  the 
absence  of  any  future  signal  assignments  taking  place  between  current  time  and  the  last 
time  on  the  transaction  stack,  this  will  be  the  future  signal  waveform. 

3.5.  Signal  Assignment  Statements 

VHDL  supports  two  types  of  signal  assignment  statements,  those  which  execute 
sequentially  (sequential  signal  assignment)  and  those  which  execute  simultaneously  (con¬ 
current  signal  assignment).  Sequential  statements,  which  must  be  nested  inside  a  con¬ 
current  process  statement,  are  an  abstract  means  of  describing  I/O  transforms,  while 
concurrent  statements  lean  more  toward  a  specific  hardware  implementation. 

3.5.1.  Sequential  Assignment  Statements.  An  algorithmic  approach  to  hardware 
modeling  would  use  a  sequence  of  calculations  to  map  inputs  into  outputs.  Sequential 
statements  in  VHDL  are  used  for  this  purpose.  They  must  be  nested  within  a  region 
known  as  a  process.  The  process  statement  is  itself  a  concurrent  statement  which  may 
execute  once  per  simulation  cycle.  When  the  process  executes,  however,  each  sequential 
statement  will  execute  in  turn.  Each  process  executes  in  response  to  changes  in  signals 
currently  enabled  in  its  ’’sensitivity  list”.  A  sensitivity  list  identifies  all  the  signals  which 
can  trigger  a  change  in  an  output  signal  value.  Every  time  a  signal  in  the  sensitivity  list 
changes  state,  the  process  is  activated  and  computes  a  new  projected  output  waveform. 
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Figure  3-7.  Signals  and  Drivers 


The  sensitivity  list  provides  a  means  to  improve  the  execution  time  of  a  simulation. 

Consider  simulation  of  a  D-latch,  the  primary  cell  structure  for  most  components 
within  the  WFT16  processor.  The  latch  has  one  data  bit  input  and  two  clock  senses  for 
input  signals.  Since  any  of  these  three  signals  can  affect  the  output,  they  all  will  be  listed 
in  the  sensitivity  list.  However,  the  output  changes  only  in  response  to  changes  in  the 
input  signal.  Needless  event  scheduling  can  be  avoided  by  activating  the  process  only 
when  the  input  has  changed,  not  just  because  the  clock  "ticked.”  Enable  and  disable 
statements  are  used  to  achieve  this  purpose.  All  three  signals  must  be  listed  in  the  sensi¬ 
tivity  list,  but  the  enable  statement  may  be  used  to  enable  sensitivity  to  clock  transi¬ 
tions  only  if  the  input  has  just  changed  state.  While  the  input  remains  stable,  the 


disable  statement  will  make  the  latch  process  insensitive  to  clock  transitions. 


Another  feature  of  processes  is  that  they  may  use  variables  and  constants  to  com¬ 
pute  a  value.  Since  variable  assignment  occurs  immediately,  (instead  of  at  some  point  in 
the  future,  as  in  the  case  of  signals),  an  arbitrarily  complex  algorithm  may  be  used  to 
compute  a  value  and  assign  it  to  a  signal  node.  This  feature  can  be  used  to  model  pro¬ 
pagation  through  layers  of  combinational  logic  within  one  simulation  cycle.  Using  delta 
delay  for  signal  assignment  statements  will  cause  the  signal  to  take  one  simulation  cycle 
propagation  delay  per  logic  stage.  If  variable  assignment  statements  are  used,  propaga¬ 
tion  delays  through  gates  will  not  be  a  factor  and  delta  delay  simulation  can  still  be  used 
to  simulate  clocked  stages.  For  computing  variables,  VHDL  supports  most  of  the  control 
statements  used  in  programming  languages  such  as  loops,  case  statements,  if  ..  then  .. 
else,  and,  for.  These  control  constructs  may  also  be  used  to  assign  a  signal  value  to  a 
target  based  on  a  the  value  of  a  variable. 

3.5.2.  Concurrent  Signal  Assignment  Statements.  ’’Concurrent  statements  allow  the 
user  to  specify  the  structural  characteristics  of  a  design,  and  to  describe  its  behavioral 
characteristics  in  terms  of  concurrently  executing,  sequential  processes”  [5j.  Concurrent 
statements  represent  hardware  components  which  operate  in  parallel  upon  receipt  of 
some  control  signal  or  clock  pulse. 

The  block  statement  defines  a  region  of  text  and  a  guard  statement  which  can  affect 
execution  of  processes  within  that  block.  Blocks  are  delineated  by: 

block  (optional  guard)  ....  end  block; 

statements.  The  guard  is  a  boolean  expression  which  is  referenced  by  concurrent  state¬ 
ments  using  the  reserved  word  memoried.  Memoried  statements  fire  only  when  the 
guard  expression  is  true,  if  the  guard  is  false,  changes  to  the  signals  will  not  cause  out¬ 
put  transitions.  Block  statements  group  together  statements  which  execute  in  parallel 
(on  the  same  clock  pulse  for  instance).  Processes  can  import  the  guard  value  by  insert¬ 
ing  the  guard  into  their  sensitivity  lists.  Signals  can  be  enabled  or  disabled  depending 
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on  the  guard  value.  If  the  guard  is  omitted,  its  value  defaults  to  true. 

The  scope  of  the  guard  is  only  within  the  nearest  enclosing  block  statements.  They 
can  be  nested  but  this  must  be  done  explicitly.  Blocked  statements  consist  of  declarative 
and  executable  parts.  The  architectural  body  is  an  example  of  a  blocked  structure. 
Component  and  signal  declarations  follow  the  block  label.  Begin  signals  the  start  of  the 
executable  part.  Component  instantiations,  conditional  signal  assignments  and  processes 
fall  within  this  section.  The  end  block;  signifies  the  end  of  the  scope  of  the  guard  state¬ 
ment,  but  it  may  be  imported  to  nested  blocks  by  declaring  a  port  for  the  guard  and 
assigning  the  value  of  the  port  to  that  guard. 

Component  instantiation  statements  make  use  of  a  unit  defined  with  a  component 
declaration  by  listing  the  signals  which  are  to  be  connected  to  the  ports  named  in  the 
declarations.  Ports  can  be  assigned  by  name  association,  by  positional  association,  or  a 
combination  of  both.  Name  association  is  an  explicit  linkage  of  the  port  and  the  name 
declared  in  the  component  declaration.  Positional  association  is  implicit,  local  signals  are 
identified  by  their  position  in  the  instantiation  list  with  respect  to  the  ports  listed  in  the 
component  declaration.  If  a  combination  of  the  two  methods  are  used,  all  named  associ¬ 
ations  must  occur  first. 

A  purely  structural  description  of  the  SIPO_CELL  could  be  written  by  instantiat¬ 
ing  the  MSFFs  and  T_ClATEs  and  connecting  them  through  their  port  lists: 

Tl:  T_GATE  port(SERIALJN,  SR_SIPO,  SJN); 

Ml.  MSFF  port(SJN,  CLK2,  CLK2_NOT,  CLKl,  CLKl_NOT,  SERIAL.OIT): 

T2:  T_GATE  port(PARALLELJN,  SDjSIPO,  PJN); 

T3:  T.GATE  port(SERIAL_OET,  LATCHJ3IPO,  PJN); 

M2:  MSFF  port(PJN,  CLK2,  CLK2_NOT,  CLKl,  CLKIJ^OT,  PA R A LLEL_OUT); 

These  statements  will  execute  whenever  one  of  the  signals  listed  in  the  port  list  changes. 

This  method  of  modeling  provides  a  great  deal  of  information  about  the  device  intercon¬ 
nections,  but  not  much  on  its’  operation.  There  are  other  concurrent  statements  which 
can  be  used  to  impart  a  little  more  information  about  the  behavior  of  the  device.  Since 


the  transmission  gates  are  mainly  used  to  control  the  inputs  to  the  MSFFs,  the  clarity  of 
the  description  may  be  improved  by  using  a  conditional  signal  assignment  statement. 
The  three  transmission  gate  instantiations  will  be  replaced  as  shown  below. 

SJN  <=  SERIAL JN  when  SR_SIPO  =  T 
else  ’Z’; 

P-IN  <=  PARALLEL  JN  when  SD_SIPO  =  T 
else 

SERIAL_OUT  when  LATCH_SIPO  =  T 
else  ’Z’; 

A  SIPOJIEGISTERJIOW  could  be  constructed  using  twenty-four  instantiations 
of  the  SIPO_CELL.  This  would  be  very  cumbersome  method  to  model  a  regular  array 
of  cells.  VHDL  provides  a  more  efficient  way  through  the  generate  statement.  The 
VHDL  model  for  a  SIPOJIEGISTERJIOW  is  shown  in  Figure  3-8  below. 


for  i  in  (23  downto  0)  generate 
if  i  =  23  generate 
SIPO(23):  SIPO_CELL 

port(SERIALJN,  PARALLELJN(i),  CLK2,  CLK2_NOT,  CLKl, 
CLKlJVOT,  SERLAL_OUT(i),  PARALLEL_OUT(i)); 
end  generate; 


if  i  <  23  generate 
SIPO(i):  SIPO_CELL 

port(SERIAL_OUT(i+l),  PARALLELJN(i),  CLK2,  CLK2JVOT,  CLKl, 
CLKlJMOT,  SERIAL_OUT(i),  PARALLEL_OUT(i)); 
end  generate; 


end  generate; 


Figure  3-8.  SIPO_CELLJlOW  Model 


3.5.3.  Bus  Resolution  Functions.  Each  concurrent  statement  that  assigns  to  a  node 
creates  a  separate  driver  for  that  node.  The  signal  cannot  update  its  current  value 
without  considering  the  values  of  all  the  drivers,  these  signals  are  said  to  be  atomic.  No 
changes  may  be  made  to  the  value  of  a  signal  without  considering  the  values  of  all  the 
drivers.  Bus  type  signals  are  declared  using  the  reserved  word  atomic  followed  by  the 
name  of  the  bus  resolution  function,  and  the  data  type: 

atomic  BUS_RESOLUTION_FUNCTION_ NAME  data  type; 

Other  data  types  may  also  be  atomic,  this  simply  means  that  the  elements  of  an  object, 
such  as  a  record  or  bit  vector  type,  are  inseparable.  Assignment  cannot  be  made  to  any 
one  element  individually,  all  elements  must  be  updated  in  parallel.  If  the  programmer 
tries  to  update  a  single  element  an  error  will  be  flagged. 

Bus  resolution  is  the  means  by  which  multiple  drivers  are  resolved  into  a  single 
value.  The  function  is  defined  by  the  user  and  invoked  by  the  compiler  each  time  a  new 
driver  value  rises  to  the  top  of  the  stack.  One  nice  feature  is  that  there  is  no  defined 
number  of  nodes  per  atomic  signal.  Additional  components  may  be  hung  on  the  bus 
simply  by  assigning  to  that  signal  name.  The  function  is  implicitly  called  during  simula¬ 
tion,  its  argument  list  is  an  unconstrained  array  of  that  signal  type.  An  example  of  a 
bus  resolution  function  for  tristate  signals  is  shown  in  section  3.4. 

An  explicit  function  call  can  also  be  used  to  perform  bus  resolution  type  behavior. 
The  function  call  would  contain  a  listing  of  the  signals,  both  control  and  data,  which 
could  affect  a  node,  and  return  the  value  of  the  future  signal  driver.  Bus  resolution  via  a 
function  call  is  used  in  the  LATCH  example  in  the  next  section. 

3.8.  CMOS  Latch  Example 

The  latch  is  the  building  block  for  all  clocked  elements  within  the  processor.  A 
clocked  CMOS  latch,  shown  in  Figure  3-9,  will  demonstrate  how  VHDL  is  used  to  model 
hardware.  The  box  surrounding  the  latch  represent  the  distinction  between  entities  and 
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bodies.  Entities  describe  the  interface  of  the  latch  to  external  circuitry,  and  bodies 
describe  how  the  internal  hardware  performs.  The  Latch  interface  consists  of  the  signal 
lines  IN,  OUT,  CLK,  and  CLK_BAR.  Therefore,  a  simple  VHDL  interface  description 
can  be  written  as  shown  below. 


with  Latch_package;  use  Latch_package; 
entity  Latch 
(BITJN:  in  Z_bit; 

BIT_OUT:  buffer  ZJ3IT; 

CLK,  CLK_BAR:  in  CLKJSIGNAL  :=  ’O’)  is 
end  LATCH; 


In  this  example  ”BIT_IN”  is  a  signal  driven  by  a  source  external  to  the  Latch.  Its  value 
will  be  used  by  the  device,  but  it  may  not  be  changed  within  the  boundaries  of  the  latch, 
port  mode  in  is  the  read-only  mode.  The  port  BIT_OUT  is  the  output  signal  and  also 
the  signal  source  for  the  feedback  loop.  The  port  mode  buffer  is  used  because  it  requires 
the  signal  source  be  interior  to  the  body,  but  also  allows  the  value  to  be  referenced 
within  the  body.  Since  it  may  not  be  driven  by  a  source  external  to  the  body,  it  is  read¬ 
only  for  outputs.  The  clock  signals,  CLK,  and  CLKJ3AR  are  of  type  CLK_SIGNAL. 
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This  is  a  user  defined  type  and  the  context  clause  (with  ..  use  ..  )  implies  that  it  has  been 
defined  in  the  package  Latch_package.  Node  A  illustrates  the  requirement  for  a  bus  reso¬ 
lution  function.  This  node  may  be  driven  by  two  separate  components,  the  T-  gate  and 
the  clocked  inverter.  A  detailed  architecture  description  is  shown  below. 


architecture  one_description  of  Latch  is 

description_blk: 

block 

--  The  double  dash  is  the  comment  delimiter  in  VHDL. 

--  The  component  declarations  provide  a  copy  of  the  device  for 
--  use  within  the  body. 

component  T_gate  port(A:  in  Z_bit;  B:  out  Z_b i t ;  C:  in  bit); 
component  Inverter  port(C:  in  Z _ bit;  D:  out  bit); 

component  TRI_STATE_INVERTER  port(A:  in  Z_bit;  B:  out  Z_bit;  C:  in 
bit); 

--  This  is  a  ’’block  configuration”  for  the  T_gate  used  within  this 
--  description.  The  use  is  a  binding  indication  which  ties 
--  together  the  predefined  device  T_GATEJNTERFACE  to  the  label  Tl 
--  in  the  component  instantiation  statements.  The  ports  listed  in 
--  the  T_gate  entity  description  are  tied  to  those  listed  within  the 
--  component  statement  above.  Finally,  the  body  identifies  a 
--  particular  architectural  body  to  be  used  with  the  entity.  The 
--  other  components  are  configured  in  a  similar  manner. 

for  Tl:  T_gate  use 
entity  (T_GATE_INTERF ACE) 

port  map  (T_gatejn  =>  A;  T_gate_out  =>  B;  Control  =>  C) 
body  (a_behavior); 
end  for; 


signal  A:  atomic  LATCH_RESOLVE  ZJBIT; 

--  LATCH_RESOLVE  is  the  bus  resolution 
--  function  which  will  be  used  to 
-  determine  the  value  of  node  A. 
signal  tmp:  bit; 


block  (CLK  =  ’0’  and  not  OUT’stable) 


--  this  is  the  guard  statement  associated 
~  with  this  block; 

A  <=  memoried  tmp; 


--  This  signal  assignment  statement  will  only  execute  when  the  guard 
--  is  true.  Note  two  assignments  to  node  A.  If  there  were  other 
--  statements  within  this  block  that  did  not  use  the  word  memoried , 
--  they  would  execute  regardless  of  the  value  of  the  guard. 

--  Event  scheduling  could  be  minimized 

--  by  guarding  the  input  node  with  the  boolean  expression  (not 
--  IN’stable)  and  using  a  memoried  signal  assignment  statement  to 
--  signal  A.  The  output  of  the  latch  will  remain  the  same  unless  the 
--  input  changes,  without  requiring  event  scheduling  on  every  clock 
--  transition. 

end  block; 


Tl:  T_GATE  port  (IN,  A,  CLK); 

II:  INVERTER  port  (A,  B); 

TRIl:  TRIJ3TATEJNVERTER  port(B,  tmp,  CLK_BAR); 
end  block  description_block; 
end  one_description; 

Declaring  A  to  be  atomic  tells  the  compiler  that  that  node  is  driven  by  more  than  one 
source  and  the  function  LATCH_RESOLVE  will  be  used  to  determine  its  value.  The 
function  is  located  within  the  package  Latch_package  as  shown  below.  Once  the  latch  is 
built  and  tested  it  may  be  declared  in  the  same  manner  as  t;  T_gate  in  this  example. 


.41. 


.’Ll'1. 


v'-V-  V  -Wy-VSJA 


package  Latch_package  is 


type  CLKjSIGNALS  is  (CLK,  CLK_BAR); 
type  CONTROL_SIGNAL  is  (RST,  SHJtIGHT,  LOAD); 
type  Z_bit  is  (’O’,  ’1’,  ’Z’); 

function  R_LATCH_RESOLVE  (RST,  BITJN,  LJBIT:  Z_bit) 
return  Z_bit  is 

constant  RST_SIGNAL  :ZJ>it:=0;  --  This  assigns  a  value  of  ’0’ 

-  to  RSTJ3IGNAL. 


if  (RST  >=  1)  then 
return  RST_SIGNAL; 
elsif  BITJN  >=  (’O’  or  T)  then 
return  BITJN; 
else 

return  L_BIT; 
end  if; 

end  RJ.ATCHJIESOLVE; 

function  LATCHJIESOLVE  (arrayO  of  ZJ3IT) 
return  ZJ)it  is 

for  I  in  input’low  to  input’high  loop 
if  input(I)  /=  ’Z’  then 
output  :=  input(I); 
exit; 
end  if; 
end  loop; 

return  output; 
end  LATCHJIESOLVE; 

end  Latch_package; 


The  function  RJ^ATCH JIESOLVE  is  a  function  which  would  be  called  to  resolve  the 
inverter  input  value  to  a  circuit  as  shown  in  Figure  3-10.  The  conditional  signal  assign¬ 
ment  calling  the  function  to  return  the  output  value  would  be  written  as  follows: 

A  <  =  RJJATCH_RESOLVE(BITJN.  RST.  INVERTjOUT); 

Note  from  the  circuit  diagram  that  the  RST  is  being  implemented  in  a  behavioral 
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Figure  3-10.  Resettable  CMOS  Latch 


fashion,  rather  than  a  description  of  the  circuit  (structural)  implementation. 


This  is  a  very  detailed,  in-depth  description  of  a  latch.  A  much  more  concise,  com¬ 
pact  VHDL  description  can  be  written  which  will  execute  much  more  efficiently.  This 
description  will  model  the  function  that  the  latch  performs,  rather  than  the  subcom¬ 
ponents  which  implement  that  function.  Behavioral  descriptions  will  focus  entirely  on 
function  at  the  expense  of  detail.  An  alternate  latch  description  is  shown  below 
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architecture  BEHAVIOR  of  LATCH  is 

block(  not  bit _ in  ’stable) 

begin 

process(  guard,  elk,  bit _ in ) 

begin 

if  (guard  and  bit _ in  /=  Z)  then 

enable  elk; 
else  disable  elk; 
end  if; 

if  (elk  =  T’)  then 

bit_out  <=  not  bit _ in ; 

end  if; 
end  process 
end  block; 
end  BEHAVIOR; 

This  description  uses  a  process  statement  to  assign  the  inverted  input  to  the  output. 
Several  conditions  are  placed  upon  this  assignment  in  order  to  minimize  the  number  of 
transactions  placed  in  the  driver  for  signal  bit_out.  In  general,  we  wish  to  avoid 
scheduling  unless  the  new  output  value  is  different  from  the  previous  one.  In  order  for 
the  output  to  get  assigned  a  value,  these  conditions  must  be  met:  1).  the  input  value 
must  have  changed.  2).  the  input  must  not  be  the  high  impedance  value,  and  3).  the 
clock  must  be  high.  This  description  models  the  same  function  as  the  preceding  exam¬ 
ple.  but  it  eliminates  transactions  caused  by  the  _not  transitions,  and  it  only  executes  if 
it  will  cause  a  different  output  to  be  put  in  the  driver. 


3.7.  Complete  SIPO  Modeling 

Finally  we  are  in  a  position  to  do  a  complete  SIPO  description  in  VHDL.  This  sec¬ 
tion  will  pull  together  the  previous  examples,  as  well  as  incorporate  the  principles  used 
in  the  last  latch  example,  that  of  trying  to  avoid  unnecessary  CPI  overhead  caused  by 
redundant  event  scheduling.  The  methodology  used  in  this  section  will  parallel  the 
methodology  used  to  do  the  complete  WFT16  modeling. 

In  general,  we  wish  to  mode!  and  simulate  circuits  at  a  level  of  detail  sufficient  to 
observe  the  functionality  and  operability  of  the  unit  cell,  but  not  to  the  level  of  every 
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signal  node  switching  as  the  clock  ticks  . 

The  SIPO  was  decomposed  into  a  single  cell,  the  SIPO_CELL,  in  section  3-1.  Fig¬ 
ure  3-2  shows  that  the  primary  components  are  MSFFs  and  T_gates.  As  stated  earlier, 
transmission  gates  (T_gates)  are  primarily  used  to  gate  inputs  with  control  signals  or 
clock  pulses.  Thus,  we  shall  model  them  behaviorally  with  conditional  signal  assignment 
statements.  The  MSFFs  on  the  other  hand,  are  built  from  two  latches  of  the  type 
modeled  in  the  previous  section.  Instead  of  building  a  MSFF  from  two  instantiations  of 
a  latch,  we  shall  use  the  same  principles  to  model  the  MSFF  behaviorally. 
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architecture  behavior  of  MSFF  is 
block 

signal  mid  :  bit; 
begin 

block(not  bit_in’stable) 
begin 

process(guard,  bit _ in,  clk2) 

begin 

if  (guard  and  bit _ in  =  ’Z’) 

enable  clk‘2; 
else 

disabl  rlk'2. 
end  if. 

if  (clk'2  =11  then 
mid  n<n  <-on vz_b( bit_jn ); 

end  if 

end  process 
end  block 

block  l  not  m  id  stable  i 
begin 

process!  guard.  mid  rlkl) 
begin 
if  (guard) 
enable  clkl 
else 

disable  clkl. 
end  if; 

if  (elk  1  =  T)  then 
bit_out  <=  not  convb_z(mid); 
end  if; 
end  process; 
end  block; 
end  block; 
end  behavior; 

I’sing  this  description  as  a  building  block,  we  may  now  efficiently  model  the 
SIPO_CELL. 
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architecture  MIXED  of  SIPO_CELL  is 


block 

component  MSFF  port  (a:  in  Z_bit;  CLK2,  CLKl:  in  clk_signal, 
z:  out  bit); 

--  defer  configuration  of  MSFF,  do  in  a  separate  configuration  body, 
begin 

process(shift_right,  ser_in) 
begin 

if  (((ser_in ’stable  nor  (ser_jn  =  Y)) 

and  shift _ right  =  T)  then 

from_ser  <  =  ser_in; 
end  if; 
end  process; 

process  (parallel_in,  ser_out,  latch,  shift_down) 
begin 

if  (shift_down  nor  latch)  then 
disable  ser_out,  paralleljn; 
else  enable  ser_out,  parallel_in; 
end  if; 

if  ((shift_down  =  ’!’)  and  (not  paralleLin))  then 
to_parallel  <=  paralleljn; 
elsif  ((latch  —  ’1’)  and  not  (ser_out’stabie))  then 
to_parallel  <—  ser_out; 
end  if; 
end  process; 

fLser:  MSFF  port(from_ser,  CLK2,  CLKl,  ser_out); 
ff_par:  MSFF  port(to_parallel,  CLK2,  CLKl,  p_out); 

end  block; 
end  MIXED; 

Using  this  mixed  description  of  the  SIPO_CELL  the  SIPO  may  be  described  as  an  array 
of  these  cells.  Modeling  the  SIPO  as  an  array  [  1 6l [24]  of  SIPO_CELLs  must  be  done  in 
two  steps.  First,  construct  a  [1  ] [24]  row  of  cells,  and  then  use  this  row  (instantiate)  six¬ 
teen  times  to  build  an  array  [16] [1  j  of  rows.  The  SIPO_CELL  is  configured  at  the  row 
level.  Since  our  goal  was  to  observe  the  WFT16  at  the  functional  level,  the  SIPO_CELL 
is  the  highest  level  at  which  we  will  attempt  to  model  things  behaviorally.  Above  this 
level,  things  will  be  modeled  at  a  purely  structural  level.  It  is  possible  to  use  the 


hierarchical  modeling  facilities  of  VHDL  to  model  behaviorally  at  much  higher  levels,  but 
that  will  not  be  done  here.  The  interface  description  for  the  SIPO_ROW  is  shown  below. 


with  SIPO_PACKAGE;  use  SIPO_PACKAGE; 
entity  SIPO_ROW 

(BITJN:  in  z_bit; 

WORD_IN:  in  24_bit_vector; 

CLK2,  CLK2_NOT,  CLKl,  CLKl^OT:  in  clk_signals  :=  ’O’; 
SHIFT_RIGHT,  SHIFT_DOWN,  LATCH:  in  control; 
WORDjOUT:  out  24_bit_vector))  is 

end  SIPOJIOW; 

architecture  Structure  of  SIPO_ROW  is 
block 

signal  SERIAL  JNT:  bit_vector(22  downto  0); 
begin 


for  i  in  (23  downto  0)  generate 
if  i  =  23  generate 
SIPO(23):  SIPO_CELL 

port(SERIALJN,  PARALLEL JN(i),  CLK2,  CLK2^JOT,  CLKl, 

CLKlJVOT,  SERIALJNT(i),  PARALLELjOUT(i)); 
end  generate; 

if  ((  i  <  23  )  and  (  i  >  0  ))  generate 
SIPO(i):  SIPO_CELL 

port(SERIALJNT(i+l),  PARALLEL JN(i),  CLK2,  CLK2JSIOT,  CLKl, 
CLKlJMOT,  SERIAL  JNT(i+l),  PARALLEL_OUT(i)); 
end  generate; 

if  i  =  0  generate 
SIPO(O):  SIPO_CELL 

port(SERIALJNT(i+l),  PARALLELJN(i),  CLK2,  CLK2JMOT.  CLKl. 
CLKl_NOT,  PARALLEL_OUT( i)); 
end  generate; 

end  generate: 
end  block; 
end  structure; 

The  final  step,  generation  of  the  entire  SIPO  array  as  a  set  of  SlPO_ROWs,  is  simi¬ 
lar  to  the  generation  of  SIPOJIOW.  A  special  body,  SIPO_TOP.  will  be  used  as  the 
topmost  row  in  the  array.  Signal  declarations  are  also  required  for  the  parallel  inputs 


and  outputs  of  the  rows  internal  to  the  structure. 

Modeling  the  SIPO  as  an  array  of  cells  requires  several  steps,  and  different  architec¬ 
tural  bodies  and  interfaces.  The  top  level  interface  description,  shown  below,  is  the  final 
product,  so  far  as  the  rest  of  the  circuit  is  concerned,  of  the  description  process.  The 
detail  shown  in  the  circuit  modeling  is  buried  in  the  input-output  transform  of  the  SIPO. 

entity  SIPO 

(  WORD_IN:  in  24_bit_vector; 

SERIAL_OUT  :buffer  16_bit_vector; 

CLK2,  CLK2JVOT,  CLK1,  CLKl_NOT:  in  clk_signal  :=  ’O’; 

SRJSIPO,  SDjSIPO,  LATCHJSIPO:  in  control;  )  is 
end  SIPO; 

The  architecture  could  just  as  easily  (actually  much  more  easily),  have  been  modeled 
behaviorally.  As  long  as  the  output  bit  stream  from  both  simulations  look  the  same,  the 
level  of  detail  of  the  description  is  irrelevant. 


CHAPTER  4 


VHDL  Modeling 


4.1.  Overview 

This  chapter  will  present  the  structural  decomposition  of  the  16-point  WFTA 
(WFT16)  processor  leading  to  the  VHDL  modeling  of  its  primary  circuit  components.  A 
top  down  decomposition  will  impose  a  signal  flow  on  the  system  which  can  be  used  to 
define  the  VHDL  interface  entity.  Once  the  processor  is  decomposed  into  its  primary  cell 
structures,  a  hierarchical  description  of  the  chip  will  be  facilitated  by  a  bottom-up  cell 
description. 

4.2.  lB-Point  WFTA  Processor 

The  16-point  Winograd  algorithm  was  discussed  in  Chapter  2.  The  basic  architec¬ 
ture  for  all  of  the  Winograd  processors  consists  of  input/output  registers,  arithmetic  cir¬ 
cuitry,  special  cells  for  parity  and  rounding  operations,  address  storage  ROMs,  and  a 
control  sequencer.  Primary  differences  in  the  actual  implementation  of  the  different  pro¬ 
cessors  result  from  different  numbers  of  arithmetic  operations  in  the  pre-,  post  addition 
array,  and  the  height  of  the  serial  multiplier  array.  The  desire  to  balance  the  latency 
between  all  the  processors  in  the  pipeline  would  require  different  data  word  lengths  to 
compensate  for  the  different  array  sizes. 

4.3.  Operation. 

The  processor  architecture  is  a  pipelined  bit-serial  machine.  The  major  processing 
blocks:  input  and  output  registers,  preadders,  multipliers,  postadders,  and  parity  circui¬ 
try  form  the  first  level  of  decomposition.  Figure  -4-1  shows  this  level  of  decomposition 
for  the  WFT16  processor  superimposed  over  a  signal  flow  graph. 
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Figure  4-1.  Decomposition  and  Signal  Flow  of  the  WFT16  Architecture 


The  processor  can  be  divided  into  two  separate  identical  sections,  real  and  ima¬ 
ginary.  These  sections  are  independent  through  the  last  column  of  the  post  adders.  In 
this  column,  real  and  imaginary  data  is  added/subtracted  to  form  the  complex  outputs. 
Since  the  two  sides  are  mirror  images,  only  one  side  will  discussed  and  modeled,  with  the 
understanding  that  the  other  side  performs  exactly  the  same  operations,  in  the  same 
sequence,  only  with  a  different  set  of  data. 

The  input  register  is  a  parallel-in,  serial-out  (PISO)  register,  twenty  four  bits  wide 
by  sixteen  words  deep.  Input  data  is  twenty-three  bits  of  data  and  one  parity  bit.  Every 
other  clock  cycle  the  PISO  gets  a  new  word  from  one  of  the  two  off-chip  input  memories, 
using  an  address  from  the  XROM.  The  signal  SD_PISO  is  used  to  shift  the  pre-existing 
words  in  the  parallel  portion  down  one  level.  After  thirty-two  clock  cycles,  the  PISO  is 
full  and  the  signal  LATCH_PlSO  goes  high  to  transfer  the  word-parallel  data  into  the 


serial  shift  register.  This  empties  the  parallel  portion  of  the  PISO,  and  the  cycle  repeats 
itself  for  as  long  as  the  operate  flag  remains  high.  Words  in  the  serial  register  will  be 
shifted  into  the  parity  check  and  zero  fill  cell  (PC/ZF),  least  significant  bit  first,  one  bit 
per  word,  while  the  signal  SR_PISO  is  high.  To  allow  for  numerical  growth  through  the 
arithmetic  pipeline,  the  data  is  extended  to  a  thirty-two  bit  word  length  in  the  PC/ZF 
cell.  Parity  is  checked  and  the  parity  bit  stripped  in  this  cell.  The  PC/ZF  cell  has 
inserted  one  half  clock  cycle  delay.  Some  zeroes  are  inserted  prior  to  the  LSB  to  scale  up 
the  data  to  enhance  the  signal  to  noise  ratio.  Sign  extensions  are  appended  after  the 
MSB  in  order  to  prevent  arithmetic  overflow.  The  reader  is  referred  to  [17]  and  [4]  for 
more  information. 

The  number  of  zero  fills  and  sign  extensions  are  determined  by  the  adaptive  scaling 
algorithm  which  takes  into  account  the  relative  magnitude  of  the  input  data.  Each  4080 
point  data  set  is  associated  with  a  scale  factor  which  reflects  the  magnitude  of  the  larg¬ 
est  number  in  the  input  data  set.  The  scale  factor  is  the  smallest  number  of  sign  exten¬ 
sions  of  any  number  in  the  set.  To  avoid  overflow,  the  largest  number  (scale  factor  0) 
requires  five  sign  extensions.  Data  sets  composed  of  smaller  numbers  can  replace 
unneeded  sign  extensions  by  zeroes  to  enhance  numerical  performance. 

The  arithmetic  section  actually  implements  the  Winograd  Fourier  Transform.  To 
generate  the  multiplicand  from  the  output  of  the  PC/ZF  up  to  four  sequential 
addition/subtraction  operations  may  be  needed.  Multiplicands  generated  in  less  than 
four  operations  remain  aligned  with  the  other  elements  in  the  bit-vector  through  the 
adder/subtractor  columns  by  replacing  the  one-delay  wide  A/S  cells  with  MSFFs.  Most 
circuit  components  in  the  WFT16  have  an  input  tp‘2  latch  and  an  output  01  latch. 
Exceptions  to  this  rule  are  the  PC/ZF  which  is  a  02  latch  preceded  by  some  combina¬ 
tional  logic,  and  the  adder  subtractors  (A/S).  The  A/S  are  reversed,  data  enters 
through  a  01  latch,  and  leaves  through  a  02  latch.  To  balance  the  pipeline  with  an 
equal  number  of  02,  and  01  latches,  some  extra  latches  are  put  at  each  end  of  the  adder 
arrays.  A  pipeline  view  of  the  preadd  section  is  shown  in  Figure  4-2. 
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Figure  4-2.  Preadd  Pipeline 


The  fourth  column  operation  only  involves  two  data  words,  the  result  of  which  is  the 
DC  components  of  the  Fourier  transform.  This  is  the  last  arithmetic  operation  to  be 
performed  on  these  two  bit  streams.  They  will  travel  through  the  rest  of  the  pipeline 
through  delay  MSFFs.  Since  the  sum  and  difference  of  this  operation  pass  through  a 
trivial  multiplier  (Xl),  the  last  adder/subtractor  (A/S)  column  can  be  eliminated  and 
the  sum/difference  of  these  two  terms  computed  in  the  first  column  of  the  postadd 
array.  This  will  reduce  pipeline  latency  one  clock  cycle  and  eliminate  thirty-five  MSFFs. 
There  is  a  one  clock  cycle  latency  through  each  column  of  the  preadders,  thus  the 
preadd  section  of  the  WFT16  introduces  four  cycles  latency  into  the  pipeline. 

The  multiplier  array  consists  of  an  array  [18] [14]  of  multiplier  cells.  The  28  bit 
Winograd  coefficients  are  encoded  into  fourteen  cells  using  Booth’s  quaternary  encoding 
algorithm.  Each  bit  of  the  reduced  coefficient  represents  one  bit  of  the  serial  multiplier. 
Since  each  multiplier  cell  requires  three  delay  stages,  there  are  a  total  of  forty-two  cycles 
of  latency  through  the  multiplier. 

The  postadder,  like  the  preadder,  requires  three  columns  of  adders.  In  column  one, 
the  add  operation,  deferred  when  the  fourth  column  of  the  preadd  array  was  eliminated, 
is  performed.  Data  is  either  real  or  imaginary  through  the  first  two  columns  of  the  post 
adder.  In  the  third  column  the  two  streams  are  mixed,  resulting  in  complex  outputs. 
The  next  stage  is  the  parity  generation,  arithmetic  rounding  cell.  At  this  point  the 


thirty-two  bit  results  carried  through  the  arithmetic  pipeline  are  rounded  down  to 
twenty-three  bits.  The  PR  cell  calculates  odd  parity  on  these  twenty-three  bits  which  is 
then  appended  to  make  a  twenty-four  bit  word.  The  diagram  of  the  postadd  portion  of 
the  pipeline  is  shown  in  Figure  4-3.  The  output  leads  into  the  serial  in,  parallel  out 
(SIPO)  register.  The  SIPO  has  the  same  organization  as  the  PISO,  only  the  data  enters 
bit  serial,  and  leaves  word  parallel. 

After  the  MSB  (which  is  the  parity  bit)  has  entered  the  SIPO  the  signal 
LATCH_SIPO  rises  and  drops  the  bits  into  the  parallel  portion  of  the  SIPO.  Every 
other  clock  cycle  the  complex  output  is  sent  to  the  output  RAM,  the  memory  address 
again  are  supplied  by  the  XROM. 

4.4.  Processor  Decomposition 

Any  one  section  of  the  processor  is  continually  operating  on  a  one  bit  slice  of  a 
thirty-two  bit  vector.  Latency  through  the  pipeline  is  119  clock  cycles,  but  once  a  word 
enters  the  PISO  it  is  associated  with  fifteen  other  bits  in  the  same  position  in  their 
respective  data  words.  This  alignment  is  maintained  throughout  the  pipeline. 

The  WFT16  processor  can  be  decomposed  into  parallel  columns  of  functional  com¬ 
putation  units.  The  height  of  the  column  would  represent  the  number  of  bit  streams  (or 
wires)  crossing  the  interface.  The  second  level  of  decomposition  is  shown  in  Figure  4-4. 
VHDL  interface  descriptions  could  be  written  to  cover  the  number  of  bits  coming  across 


FROM  MULTIPLIER  TO  SIPO 


Figure  4-3.  Postadd  Pipeline 
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the  interfaces  from  each  column,  and  the  control  signals  required  per  column  for  this 
level  of  decomposition.  The  next  level  of  decomposition  will  break  the  columns  into 
their  constituent  processing  elements.  These  processing  elements  are  the  primary  build¬ 
ing  blocks  for  the  WFTA  processor.  A  final  decomposition  will  tear  these  cells  down 
into  flip  flops,  latches,  transmission  gates,  and  inverters.  Each  column  in  Figure  4-4  is 
built  by  stacking  a  number  of  primary  circuit  components.  The  primary  circuit  com¬ 
ponents  for  the  WFT16  are  the  PISO_CELL,  the  MSFF,  the  A/S,  the  five  multiplier 
cells,  the  PARITY  ROUND  CELL,  and  the  SIPO_CELL.  For  the  purposes  of  function¬ 
ally  simulating  the  entire  circuit,  these  cells  will  be  the  highest  level  where  behavioral 
constructs  will  be  used.  Above  this  level,  at  the  column  or  block  level,  the  descriptions 
will  be  purely  structural.  The  VHDL  descriptions  of  these  cells  are  given  in  Appendix  1. 
In  addition  to  The  latch  described  in  Chapter  3,  the  lower  level  subcomponents,  which 
can  be  used  to  structurely  model  the  primary  cells,  are  also  located  in  Appendix  1. 


Figure  4-4.  Column  Form  of  VVFT16  Processor 
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CHAPTER  5 


WFT16  Simulation  Program 


5.1.  Overview 

The  VHDL  is  being  developed  to  model  and  simulate  the  VLSI  and  VHSIC  circuits 
currently  being  designed  for  future  defense  needs.  One  of  the  needs  that  it  is  intended  to 
fill,  that  of  design  verification,  is  one  that  the  WFTA  design  group  currently  requires. 
Current  simulation  tools,  such  as  N.2,  and  N.mpc  are  not  suitable  to  simulate  designs 
such  as  the  WFT16  processor  at  the  functional  level.  Furthermore,  the  run  times  of  of 
logic  level  simulators,  such  as  RNL,  become  excessive  as  the  size  of  the  circuit  increases. 
The  characteristics  of  the  WFT16  architecture  make  it  amenable  to  modeling  and  simu¬ 
lation  using  another  approach,  that  of  using  a  high  order  language  with  the  necessary 
bit-level  operators  to  develop  a  custom  simulation  tool. 

The  main  goal  of  the  simulation  was  to  verify  that  the  16- point  processor  imple¬ 
ments  the  16-point  Winograd  Fourier  Algorithm  using  the  circuits  and  control  signal 
interactions  built  into  the  chip  design.  By  viewing  the  processor  as  a  set  of  bit  streams, 
traveling  lock-stepped  with  respect  to  each  other  through  the  pipeline,  it  is  possible  to 
see  the  basic  form  of  a  high  level  modeling  and  simulation  program.  The  interaction 
between  the  bit  streams  is  specified  by  the  16-point  Winograd  algorithm  and  imple¬ 
mented  using  the  hardware  structures  described  in  the  preceding  chapter.  A  more 
detailed  description  of  the  design  and  operational  characteristics  of  these  circuits  is  avail¬ 
able  in  l].  This  chapter  will  describe  the  programs  which  are  used  to  simulate  the  pro¬ 
cessor.  and  the  data  structures  used  to  form  the  link  between  the  model  and  the  actual 
circuits. 


Simulation  Description 


The  simulation  was  designed  and  coded  using  the  decomposition  of  the  processor 
outlined  in  Chapter  4.  This  allows  the  output  of  the  programs  developed  in  this  simula¬ 
tion  to  be  compared  directly  with  the  output  of  a  VHDL  simulator.  It  is  also  directly 
compatible  with  the  algorithmic  simulator  developed  by  Taylor  [17].  The  output  of  the 
simulator  is  a  stream  of  bits  for  each  slice  of  the  processing  elements.  Follow  on  efforts 
which  implement  testability  into  the  processor  can  use  this  output  to  generate  test  vec¬ 
tors  for  hardware  testing. 

The  simulation  consists  of  five  programs  which  execute  in  sequence  under  the  con¬ 
trol  of  a  shell  script  to  simulate  the  WFT16  processor.  The  processor  architecture  was 
partitioned  in  the  manner  shown  in  Figure  5-1.  This  partitioning  allows  for  incremental 
development  of  the  simulation  using  outputs  from  previously  tested  modules.  It  also 
limits  the  size  of  the  individual  programs  resulting  in  faster  compilation  and  run  times 
during  program  development.  The  output  of  each  column  is  written  to  a  file  for  analysis 
during  coding  and  future  test  vector  generation. 

The  programs  are  listed  by  name  and  the  processor  blocks  which  they  simulate: 

CS.C:  The  Control  Sequencer. 

C_CNTRL.C:  The  arithmetic  reset  and  multiplier  control  circuitry. 

PRE_WFTA.C:  The  PISO,  ZF/PC,  and  three  preadd  columns. 

MULTIPLY.C:  The  serial  multiplier. 

POST_WFTA.C:  The  postadder  columns.  parity_round  circuit 
and  the  SIPO. 

Programs  were  also  written  to  aid  in  data  analysis.  The  numerical  performance  of 
the  WFT16  was  simulated  by  j  1 7 j .  A  program  was  developed  that  performed  the  WFT 
at  the  algorithmic  level,  using  double  precision  integers  and  the  WFT16  equations.  Tay¬ 
lor  wrote  a  decimal-to-binary  conversion  program  which  was  modified  to  compute  odd 
parity  and  append  it  following  the  MSB  of  each  input  word.  It  is  used  to  convert  his 
input  data  sets  into  a  form  usable  as  input  to  this  simulation.  The  loop  between  the  out- 


Figure  5*1.  Partitioning  of  the  Processor  for  Simulation 


put  of  this  simulation  and  the  algorithmic  simulation  was  closed  by  writing  a  program 
which  converted  the  binary  outputs  of  the  simulation  into  decimal  for  comparison 
against  the  results  of  Taylor’s  numerical  WFT  simulation. 

6.3.  Time 

The  representation  of  time  in  VHDL  is  done  with  the  physical  type  time  which 
could  be  a  variable  length  or  even  infinitesimally  small  time  unit.  The  simulator  kept 
track  of  events  and  transactions  scheduled  to  occur.  In  this  simulation,  the  artifice  of 
VHDL  time  is  replaced  by  a  spatial  separation  of  events.  Events  that  are  scheduled  to 


occur  simultaneously  are  textually  grouped  together.  The  YVFT16  processor  controls  the 
hardware  with  a  two-phase  clock.  The  events  which  are  scheduled  to  occur  at  the  same 
point  in  time  can  be  roughly  partitioned  into  events  which  are  scheduled  to  happen  on 
one  of  the  two  clock  phases.  The  simulation  uses  two  counters  to  model  the  system 
clock.  The  master  clock,  kept  in  the  control  sequencer,  is  appended  to  every  control 
word.  Every  clock  cycle  the  master  clock  is  compared  against  an  internal  clock,  kept  in 
each  program.  If  they  are  not  identical  the  simulation  program  will  issue  a  non¬ 
synchronization  message  and  terminate.  All  column  outputs  written  to  files  are  also 
tagged  with  internal  clock  time.  When  this  data  is  used  by  another  program  the  time 
tag  is  checked  against  its  internal  clock  in  the  same  procedure  outlined  above.  This 
method  keeps  the  pipeline  lock-stepped  during  file  communication.  The  pipelined  archi¬ 
tecture  can  be  modeled  by  a  program  looping  structure  which  sequentially  runs  through 
all  bit  manipulation  and  movements  (shifting)  operations  in  the  pipeline.  The  internal 
counter  is  incremented  every  cycle,  which  is  compared  against  a  limit  to  determine  when 
to  terminate  the  simulation. 

A  clock  cycle  can  be  defined  as  a  <j>2  event  which  is  followed  by  a  <pl  event.  This 
definition  is  necessary  because  of  the  sequential  nature  of  the  simulation.  The  program 
operates  in  a  loop,  first  (j>2  events  occur,  then  4>l  events  occur.  The  process  repeats  itself 
for  as  many  cycles  as  control  signals  are  available.  In  the  hardware,  operations  occur 
concurrently  based  on  the  phase  of  the  clock,  <z>2  and  <i>  1  events  are  separated  in  time,  in 
the  simulation  these  events  are  separated  textually.  A  o'l  event  occurs  when  data  avail¬ 
able  at  the  input  is  gated  into  a  0 2  latch.  Any  combinational  logic  which  occurs 
between  a  0I  latch  and  a  02  latch  is  also  defined  as  a  02  event.  01  events  are  defined 
in  a  similar  manner.  To  model  the  propagation  of  a  bit  through  delay  stages  without  a 
two  phase  clock,  provisions  must  be  made  to  ensure  a  data  bit  is  not  available  to  affect 
the  inputs  of  the  succeeding  stage  until  one  simulation  cycle  after  it  was  created  or 
modified  (this  is  similar  to  a  signal  assignment  statement  not  being  allowed  to  affect  the 
current  value  of  a  signal  in  VHDL).  This  is  accomplished  grouping  all  the  o'l  events  at 


the  beginning  of  the  program,  and  all  the  <j>  1  events  at  the  end.  This  forces  the  4> 2 
latches  to  work  with  the  bits  put  into  the  <f>  1  latches  at  the  end  of  the  preceding  simula¬ 
tion  cycle.  For  example  consider  the  last  stage  of  the  P1SO  and  the  following  PC/ZF 
cell  shown  in  Figure  5-2. 

Inputs  to  the  MSFFs  are  gated  by  a  <f>2  clock  pulse,  and  are  moved  into  the  second 
latch  by  a  <j>  1  pulse.  The  combinational  logic  which  takes  place  between  latches  is  simu¬ 
lated  prior  to  the  <f> 2  latch  in  the  PC/ZF  cell.  The  code  would  be  written  and  executed 
in  the  following  sequence: 


PARALLEL  IN 


PARALLEL. OUT 


Figure  5-2.  PISO  and  PC/ZF  Stage 


START: 


<t> 2  events 

Read  Control  Word. 

PISO_CELL  EVENTS 

If  SD_PISO  high, 

latch  data  into  <f>2  latch  from  parallel  <£1  latch  above. 

If  SRJPISO  high, 

latch  data  into  <j>2  latch  from  serial  <f>l  latch  to  the  left. 
If  LATCH_PISO  is  high, 

latch  data  from  parallel  <j> I  latch  in  same  PISO_CELL. 

PC/ZF  CELL  EVENTS 

If  ZEROJFILL  high, 
put  0  into  4> 2  latch. 

If  PASS  high, 

pass  output  of  PISO  to  the  <j> 2  latch. 

If  neither  signal  high, 
do  nothing. 

<j>  1  Events. 

Move  parallel  and  serial  data  in  PISO  from  <j>2  latch  into 
the  <t>  1  latch. 

Increment  internal  control  counter. 

GO  TO  START  and  REPEAT 


The  data  is  buffered  in  this  fashion,  keeping  bits  from  being  used  by  the  following 
stage  until  one  clock  cycle  after  they  are  created.  This  prevents  the  bits  from  racing 
through  the  simulated  arithmetic  pipeline.  Additional  functions,  such  as  a  column  of 
adders,  can  be  inserted  into  the  code  by  separating  the  <t> 2  events  and  01  events  and 
placing  them  in  sequence  with  respect  to  the  components  they  follow  in  the  actual 
hardware. 
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5.4.  Program  Descriptions 

This  section  will  describe  the  programs  and  data  structures  used  in  the  VVFTA 
simulation.  Data  structures  are  used  as  ’’software  circuits”  in  the  simulation.  For  each 
of  the  primary  cells  discussed  in  the  preceding  chapter  a  data  structure  has  been 
developed.  The  elements  of  the  data  structures,  for  the  most  part,  represent  a  clocked 
storage  element  (a  D-latch)  in  hardware.  This  implies  that  there  is  a  correspondence 
between  the  latches  of  the  real  hardware  and  the  variables  declared  in  the  structures. 
Figure  5-3  shows  the  correspondence  between  one  data  structure  and  the  hardware  it  is 
supposed  to  model.  The  structure  for  the  4-1  multiplier  cell,  MULTXl,  has  one  variable 
for  each  <(>2  and  <£1  latch.  The  variable  tmp_sum  is  an  exception,  being  used  as  a  hold¬ 
ing  bin  for  the  result  of  the  addition  operation.  Assignment  of  this  variable  to  the 
sumffclk2  latch  is  dependent  on  the  value  of  the  control  signal  sign_ext. 


STRUCT 

int  rrlclkl; 
in t  f r IclK 2; 
int  f r 2c lk  1; 
int  f f 2c  1  k 2; 
int  f f 3clkl; 
int  ff3dk2; 
int  sumfrclkl; 
int  sumf rclk2i 
int  carr«jf fclMJ 
int  carryrrclkl; 
int  tnp.sun; 

flULTXl; 


DATA 


Figure  5-3.  Example  of  a  Simulation  Data  Structure 


In  addition  to  latched  variables,  signals  which  must  travel  through  more  than  one 
level  of  logic  may  also  declared  as  a  variables  in  the  data  structures. 

5.4.1.  CS.C  The  operation  of  the  control  sequencer  is  simulated  by  this  program. 
The  control  sequencer,  shown  in  Figure  5-4  consists  of  a  32  bit  ring  counter,  a  PLA,  out¬ 
put  buffers  and  XROM  address  generation  circuitry  (not  modeled).  It  generates  twenty 
control  signals  which  are  used  to  control  the  arithmetic  and  I/O  circuitry  of  the  WFT 
processor. 

The  is  the  only  program  which  will  prompt  for  input,  the  scale  factor  and  the 
number  of  clock  cycles  to  simulate.  The  output  is  a  file,  master_control,  containing  the 
number  of  cycles  simulated,  a  time  tag,  and  the  twenty  control  signals.  The  last  two 


CONTROL  OUTPUTS 


SCALE  FACTOR 


Figure  5-4.  WFTA  Control  Sequencer 


items  are  written  to  the  file  after  every  simulation  cycle.  These  control  signals  are  active 
at  the  beginning  of  every  simulation  cycle.  This  the  source  file  for  control  information 
for  all  other  programs  used  in  the  simulation. 

The  data  structures  used  in  this  program  are  the  MSFF  and  the  SRFF  (set  reset 
flip  flop).  The  MSFF  structure  contains  two  variable  bins  for  holding  the  contents  of 
the  two  latches.  The  SRFF  contains  three  variables,  a  set,  a  reset,  and  an  out  variable. 
The  Set  variable  is  used  to  maintain  the  current  value  of  the  output  interval  signal.  The 
out  variable  must  be  initialized  to  zero  prior  to  the  start  of  simulation. 

The  ring  counter  is  a  chain  of  thirty-two  MSFFs  connected  in  series.  In  hardware, 
the  output  of  the  last  MSFF  in  the  chain  and  the  input  to  the  first  are  connected  by  a 
feedback  loop.  This  allows  the  bit  to  keep  cycling  through  the  counter  while  the  con¬ 
tinue  signal  remains  high.  The  ring  counter  is  modeled  using  a  counter  that  rolls  over 
mod  thirty-two.  If  the  result  of  the  modulus  operation  is  zero,  the  input  to  the  first 
MSFF  is  set  to  one,  if  one,  the  input  is  set  to  zero.  The  bit  advances  one  MSFF  during 
each  simulation  cycle.  Control  signals  are  generated  as  a  function  of  the  position  of  the 
bit  modeled  in  the  controller.  There  are  three  basic  types  of  control  signals  in  the 
WFTI6:  pulse,  fixed  interval,  and  variable  interval.  Pulse  signals  are  high  for  one  clock 
pulse.  These  signals  are  assigned  by  reading  the  output  tap  of  the  MSFF  representing  a 
particular  clock  cycle.  If  the  bit  is  in  the  <f>\  latch  of  that  MSFF  the  signal  is  set  to  one, 
zero  otherwise.  Interval  signals  are  high  over  the  same  clock  interval  each  time  the  simu¬ 
lation  is  run.  These  signals  are  modeled  with  a  boolean  expression  that  evaluates  to  true 
if  the  clock  counter  value  is  within  the  interval  the  signal  is  supposed  to  be  active.  If 
the  expression  evaluates  to  true,  the  corresponding  control  signal  is  set  to  one,  zero  oth¬ 
erwise. 

The  final  class  of  signals  is  a  function  of  the  adaptive  scaling  algorithm.  The  inter¬ 
val  these  signals  are  high  depends  on  the  value  of  the  three  bit  scale  factor.  The  eight 
cases  are  modeled  using  an  if/then  control  structure  as  shown  below: 


VLs 


if  condition  1 
action  0; 

else  if  condition  2 
action  2; 


else 

action  8; 

Where  the  condition  represents  the  boolean  value  of  the  ’’anding”  of  the  scale  fac¬ 
tor  and  clock  counter,  and  the  action,  the  setting  or  the  resetting  of  of  the  set-reset  vari¬ 
able.  If  the  case  evaluates  to  true,  a  set  or  reset  flag  is  set  to  one.  Action  8  is  the 
default  that  occurs  if  none  of  the  cases  evaluate  to  true.  The  SRFF  function  then  evalu¬ 
ates  the  three  variables:  set,  reset,  and  out,  setting  or  resetting  the  control  signals 
accordingly. 

5.4.2.  C_CNTRL.C.  This  program  generates  the  signals  used  to  control  data  flow 
through  the  arithmetic  pipeline.  The  only  data  structure  used,  a  MSFF,  is  described  in 
section  5.2.1.  The  arithmetic  control  circuitry  consists  of  a  chain  of  forty-eight  MSFFs 
connected  in  series.  The  input  to  the  first  MSFF  in  the  chain  is  the  reset_add  signal 
generated  in  the  control  sequencer.  As  the  bit  traverses  the  chain,  it  will  be  used  as  a 
reset  signal  for  the  carry  and  borrow  MSFFs  in  the  preadd  and  postadd  arrays.  It  will 
also  generate  the  four  control  signals  needed  for  the  multiplier  cell;  resetjO,  reset_l, 
sign_ext,  and  rstdc.  The  output  of  the  program  is  written  to  three  files,  one  for  the 
preadd  array,  one  for  the  multiplier  array,  and  one  for  the  post  add  array.  This  file 
structure  represents  the  partitioning  of  the  arithmetic  portion  of  the  WFT16  architec¬ 
ture  as  shown  in  Figure  5-1. 


& 


5.4.3.  PRE_WFTA.C.  The  PRE_WFTA.C  program  simulates  the  operation  of  the 
processor  from  the  PISO  input  to  the  <t>  1  latch  following  the  third  column  of  the  preadd 
array.  This  program  capitalizes  on  the  symmetry  between  the  real  and  imaginary  por- 
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tions  of  the  WFT16  algorithm.  Rather  than  write  one  program  to  compute  both  real 
and  imaginary  results  simultaneously,  the  same  program  can  be  re-used  with  the 
different  data  sets  and  run  twice.  Since  these  data  sets  are  completely  independent 
through  two  columns  of  the  postadd  array,  the  multiplier  program  can  also  be  reused  in 
this  same  fashion.  Structures  were  defined  for  each  of  the  macro-cells  in  the  preadder. 
They  are  the  PISO,  which  consists  of  four  elements,  two  per  flip-flop,  the  ZERO_FILL 
which  contains  a  MSFF  structure,  a  latch  variable,  and  two  logic  output  variables.  The 
Adder-Subtractor  was  broken  into  two  sections,  the  input  and  the  output  and  variables 
declared  for  the  X,  Y,  carry  and  borrow  inputs,  and  the  SUM,  DIFF,  carry  and  borrow 
outputs.  Comparison  of  the  data  structures  shown  in  this  Figure  and  the  circuit 
diagrams  of  the  hardware  described  in  [4]  will  show  a  one  to  one  matching  of  the  vari¬ 
ables  and  the  outputs  of  circuit  components.  This  approach  leads  to  a  natural  synthesis 
of  the  simulation  program  from  the  hardware  components. 

The  simulation  of  multiple  cycles  is  done  using  a  loop  controlled  by  the  internal 
clock  counter.  The  loop  condition  is  set  by  the  first  word  of  the  master  control  file 
which  is  the  number  of  cycles  for  which  control  signals  are  available.  While  the  internal 
clock  is  less  than  this  value,  the  simulation  will  proceed.  The  program  is  set  up  by 
reading  the  master_control  file  and  preadd  control  word  before  every  simulation  cycle. 

The  PISO  is  implemented  as  a  (I6j[24j  array  of  PISO_CELL  structures.  The  MSB 
of  the  input  word  is  located  in  column  sixteen  of  the  array.  The  LSB,  located  in  column 
1,  is  shifted  out  first.  The  output  of  the  PISO  is  sent  to  the  PC/ZF,  a  column  of  16 
PC/ZF  cells,  where  the  parity  bit  is  stripped  and  the  wordlength  is  extended  to  thirty- 
two  bits. 

The  preadd  array,  which  follows  the  PC/ZF,  is  composed  of  three  columns  of 
adder-subtractor  (A/S)  cells  and  MSFFs  Each  column  of  the  preadd  array  either  com¬ 
putes  the  sum  and  difference  of  the  inputs,  or  delays  it  for  one  clock  cycle.  The  MSFFs 
are  used  as  place  holders  to  maintain  bit  synchronization  with  the  other  elements  of  the 


bit  vector  which  are  passing  through  the  A/S  elements.  The  A/S  is  defined  to  compute 
the  sum  and  difference,  x  ±  y,  of  the  two  serial  input  vectors.  Thus,  the  minuend  is 
assigned  to  the  x  variable,  and  the  subtrahend  is  assigned  to  the  y  variable.  The  inter¬ 
connections  of  the  A/S  and  MSFFs  of  the  preadd  array  is  a  function  of  the  Winograd 
Algorithm  and  is  shown  in  Figure  5-5. 

A  note  concerning  the  usage  of  the  reset_adder  signal  is  in  order.  Unlike  all  the 
other  control  signals,  this  signal  is  the  output  of  a  <f> 2  latch  in  the  C_CNTRL  circuitry. 
In  the  A/S  structure,  the  reset  signal  is  ’’anded”  with  the  <j) 2  clock,  which  effectively 
causes  the  signal  to  be  active  on  the  following  <f>\  pulse.  At  this  time  the  reset  signal  will 
cause  both  latches  to  be  reset.  The  reset  signal  should  reset  the  carry  and  borrow  fol¬ 
lowing  the  MSB  arithmetic  operation  of  the  preceding  data  set. 

5.4.4.  MULTIPLY .C.  The  multiplier  array  in  hardware  is  an  [18]  [14]  array  of  multi¬ 
plier  cells.  Each  cell  represents  one  bit  of  the  Booth’s  quaternary  encoded  binary 
coefficient.  In  software,  the  serial  multiplier  is  represented  by  an  array  of  data  struc¬ 
tures,  each  array  element  being  one  of  the  five  possible  multiplier  cells.  The  data  struc¬ 
tures  are  declared  to  be  external  so  that  all  variables  will  hold  their  value  between  func¬ 
tion  calls.  An  example  of  the  multiplier  data  structure  was  shown  in  Figure  5-3. 

The  simulation  proceeds  by  columns.  The  mult_cntrl  file  consists  of  a  time  tag  and 
fourteen  sets  of  four  bits  each  which  are  the  four  control  signals  for  a  column.  Before 
the  <t> 2  event  of  each  column  of  eighteen  celts,  the  program  reads  in  the  control  word  for 
that  column.  Next  the  partial  product  and  data  bit  are  read  into  the  cell  structure 
representing  a  particular  location  in  the  array.  Finally  the  function  which  simulates  the 
multiplier  cell  is  called  to  evaluate  the  bits,  and  shift  the  data  through  the  MSFFs.  This 
is  done  by  a  function  call  that  has  the  arguments,  the  pointer  to  the  data  structure,  and 
the  control  signals  needed  for  that  particular  cell.  The  pointer  name  pa70  imparts  cer¬ 
tain  information  about  the  location  of  the  structure  in  the  array.  The  p-7-  means  that 
the  multiplier  cell  is  in  the  seventh  row  of  the  array.  The  pa—  means  that  the  cell  is  in 
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Figure  5-5.  Preaddition  Operations  in  the  \VFTA  Processor. 


the  tenth  column.  Finally,  the  p-0  means  that  it  is  a  0  multiplier  cell.  Therefore,  this  a 


the  pointer  to  the  multiplier  array  element  which  is  in  the  tenth  column  of  the  seventh 


row,  and  calls  the  mO  function.  The  other  multiplier  identifiers  are:  p— 1  for  the  +1  mul¬ 


tiplier,  p~ 2  for  the  +2  multiplier,  p— n  for  the  -1  multiplier,  and  p— q  for  the  -2  multi¬ 


plier.  Theory  of  operation  of  the  multiplier  cells  is  covered  in  detail  in  [4]  and  will  not 


be  covered  here. 


5.4.5.  POST_WFTA.C.  This  program  simulates  the  WFTA  pipeline  from  the  output 
of  the  multipliers  to  the  output  of  the  S1PO.  It  is  a  dual  program  in  that  both  the  real 
and  imaginary  operations  are  simulated  simultaneously.  The  A/S  elements  are  the  same 
as  the  preadders,  and  the  SIPO  is  essentially  the  same  as  the  PISO.  The  only  totally 
new  data  structure  used  is  the  parity  round  cell.  The  parity  round  cell  consist  of  several 
levels  of  combinational  logic,  variables  were  declared  for  the  outputs  of  the  logic  as  well 
as  the  standard  latch  variables.  The  interconnection  of  the  postadd  columns  is  shown  in 
Figure  5-6.  Results  from  the  imaginary  and  real  sections  of  the  processor  are  mixed  in 
the  third  column  of  the  postadders. 


Figure  5-6.  Post  Addition  Operations  in  the  WFTA  Processor 


The  outputs  of  the  real  and  imaginary  columns,  and  the  output  of  the  both  SIPOs 
are  written  to  a  output  files. 

5.5.  Generation  of  a  WFT  Simulation 

This  section  will  discuss  considerations  involved  in  building  a  WFT  simulation 
from  the  data  structures  and  functions  defined  in  earlier  sections.  The  cells  were 
designed  to  be  single  independent  units.  Larger  computational  units  may  be  created 
merely  by  declaring  more  instances  of  the  cell  structure.  The  key  parameter  in  all  the 
programs  is  the  number  of  clock  pulses  which  constitute  a  simulation  cycle.  The  16- 
point  architecture  uses  32,  the  15  would  use  30,  and  the  17  would  use  34  clock  pulses. 
The  loop  control  functions  are  all  done  with  modulo  (number  of  clock  pulse)  arithmetic. 
An  important  point  regarding  the  control  signals  is  that  the  signals  are  set  high  in  the 
control  sequencer  on  the  cycle  which  they  are  supposed  to  be  used.  In  actual  hardware, 
the  signals  might  be  created  a  clock  cycle  ahead  of  time  and  buffered  in  an  output 
MSFF.  The  timing  diagram  used  as  a  source  document  should  be  examined  to  deter¬ 
mine  which  interpretation  was  used  in  generating  the  diagram.  The  constructs  used  in 
modeling  the  control  sequencer  were  selected  to  allow  changes  to  be  made  in  the  timing 
diagram  without  requiring  extensive  changes  to  the  simulation. 

The  PRE_WFTA.C  and  POST_WFTA.C  adder-subtractor  (A/S)  elements  are 
interconnected  using  the  equations  from  the  Winograd  program  written  by  Taylor  and 
the  coefficients  used  to  generate  the  multiplier  array  were  obtained  from  [4].  The  outputs 
from  the  algorithmic  simulation  program  were  also  used  to  verify  the  results  of  the  simu¬ 
lation. 

Implementing  the  serial  multiplier  array  as  a  fixed  array  of  data  structures  is  a 
flexible  and  easily  understandable  approach.  The  coefficient  encoding  can  be  changed 
without  having  to  redo  the  entire  array.  The  major  difficulty  encountered  in  construct¬ 
ing  the  simulation  was  the  timing  of  events  across  the  program  boundaries.  Reading 
data  from  files  is  normally  a  o'2  event,  (the  start  of  the  simulation  cycle).  On  the  other 


hand,  writing  to  a  file  is  a  <t>  1  event,  (the  end  of  the  cycle).  The  clock  time  appended  is 
the  cycle  which  the  data  was  created.  However,  when  a  file  is  read,  the  program  treats 
the  data  as  being  applicable  for  that  simulation  cycle.  Problems  arise  when  the  output 
of  one  program,  such  as  PRE_WFTA.C  is  used  as  the  input  to  the  MULTIPLY. C  pro¬ 
gram.  The  multiplier  treats  the  preadd  outputs  as  input  data  valid  on  the  same  clock 
cycle  that  is  was  created.  The  effect  of  this  is  that  the  data  is  arriving  one  cycle  before 
it  was  created.  In  many  cases,  the  effect  is  barely  noticeable,  showing  up  as  an  error  in 
the  LSB  of  some  of  the  answers,  and  very  hard  to  detect.  In  some  answers,  those  with 
just  the  right  number  of  sign  extensions,  the  fact  that  the  control  signals  and  data  are 
out  of  synchronization  by  one  cycle  causes  the  MSB  of  the  data,  the  sign  bit,  to 
overflow,  changing  the  sign  of  the  intermediate  result.  The  fix  was  simple,  once  the  prob¬ 
lem  was  identified.  Data  read  across  program  boundaries  was  defined  to  be  <p  1  event  so 
data  effectively  was  being  read  at  the  end  of  the  simulation  cycle  which  it  written  to  the 
file.  The  source  document  for  the  control  signals  defined  the  clock  cycle  that  the  signals 
were  to  be  active.  The  CS.C  program  generated  them  on  this  cycle,  therefore  this  prob¬ 
lem  did  not  affect  them.  Once  this  problem  was  detected  and  corrected,  building  the 
complete  simulation  essentially  consisted  of  interconnecting  the  data  structures  in  the 
manner  specified  by  the  16-point  WFT  algorithm,  and  debugging  programming  errors. 

5.6.  Simulation  Scenario 

C  shell  scripts  were  written  to  automate  the  execution  of  the  simulation  programs. 
Execution  was  subdivided  into  two  scripts,  generation  of  the  control  signals,  and  simula¬ 
tion  of  arithmetic  operations. 

The  script  control  executes  the  programs  CS.C  and  C'_CNTRL.C.  CS.C  is  the 
only  program  that  requires  input  from  the  keyboard.  It  will  prompt  for  the  number  of 
clock  cycles  to  simulate  and  the  scale  factor  to  the  input  data  set.  The  scenario  is 
shown  in  Figure  5-7.  Control  files  are  generally  good  for  multiple  simulations  so  they  do 
not  have  to  be  regenerated  unless  the  scale  factor  of  the  input  data  changes. 
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Figure  5-7.  Control  Generation  Simulation  Scenario 


The  script  demo  runs  the  arithmetic  simulation  programs.  The  scenario  that  the 
script  executes  is  shown  in  Figure  5-8  In  addition,  the  script  also  runs  the  output  for¬ 
mat  programs  which  convert  the  binary  streams  into  integers.  In  the  absence  of  any 
operator  action,  the  converted  output  will  scroll  across  the  screen,  so  the  normal  pro¬ 
cedure  is  to  redirect  the  screen  output  to  a  data  file  with  the  command:  demo  >& 
tst_output,  which  will  send  the  output  to  the  file  tst_output. 

The  code  used  to  simulate  the  VVFT16  is  included  in  Appendix  2. 
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CHAPTER  6 


Summary  and  Conclusions 


8.1.  Overview 

This  thesis  addressed  the  problem  of  modeling  and  simulation  of  a  VHSIC  class  sig¬ 
nal  processor.  A  new  Hardware  Description  Language  (VHDL)  is  being  developed  to 
address  this  need  by  the  VHSIC  program  office.  VHDL  is  intended  to  be  a  medium  to 
communicate  design  intent  for  large,  complex  designs  in  a  concise  manner.  Perhaps 
more  importantly,  the  VHDL  code  used  to  describe  the  circuit  also  serves  as  the  input  to 
a  hierarchical  simulator.  The  simulator  allows  a  design  to  be  simulated  at  different  lev¬ 
els  of  abstraction  within  the  same  design  entity.  This  is  a  key  concern  when  discussing 
large  designs  which  may  be  composed  of  hundreds  of  thousands  of  transistors. 

VHDL  was  originally  targeted  to  be  the  vehicle  used  to  simulate  the  design  of  a  sig¬ 
nal  processor  that  embedded  the  Winograd  Fourier  Transform  Algorithm  in  a  pipelined 
architecture.  An  early  delivery  of  the  VHDL  simulator  failed  to  materialize,  so  a  custom 
simulation  tool  was  developed  to  perform  this  task.  The  work  done  in  decomposing  and 
modeling  the  circuit  using  VHDL  translated  well  into  the  new  simulation  -  the  C  simula¬ 
tion.  This  simulation  modeled  the  processor  at  the  bit  level,  using  hardware-like  data 
structures.  It  was  used  to  validate  the  architecture,  cell  functionality,  and 
control/timing  interaction  of  the  VVFT16  processor.  Insights  obtained  during  the 
development  and  coding  of  the  simulation  was  also  useful  in  correcting  errors  which  had 
slipped  into  the  processor  design. 

8.2.  VHDL 

VHDL  was  applied  to  the  problem  of  modeling  th>-  H  I  Tin  o  i  !■* v »-l  where  the 
functionality  of  the  individual  circuits  may  be  observed  Tim  I  >i n >  ■  i-  n  il  %  ,-l  u  t>  detcr- 


mined  to  be  the  level  at  which  bits  could  be  seen  passing  through  latched  storage  loca¬ 
tions.  This  visibility  was  achieved  by  decomposing  the  circuit  into  its  smallest  func¬ 
tional  processing  components,  and  then  modeling  these  components.  It  was  found  to  be 
quite  easy  to  model  low  level  cells,  such  as  inverters,  T_gates,  and  latches,  and  to  build 
progressively  larger  circuit  models  by  instantiating  the  previously  described  subcom¬ 
ponents.  This  approach  will  model  and  simulate  circuit  operation  at  the  very  lowest 
level  of  detail.  The  CMOS  latch  was  modeled  at  the  transistor  behavior  level  using  this 
approach.  However,  the  run  time  of  circuit  simulations  containing  large  numbers  of  dev¬ 
ices  modeled  at  this  level  is  expected  to  be  excessively  long.  VHDL  provides  a  hierarchi¬ 
cal  approach,  the  circuit  could  be  modeled  at  a  higher  level  of  abstraction  for  the  pur¬ 
poses  of  simulation.  This  alternate  approach,  modeling  only  the  bare  functionality,  cou¬ 
pled  with  limiting  redundant  event  scheduling,  should  allow  large  circuits  to  simulate 
much  more  efficiently. 

The  WFT16  processor  was  decomposed  into  a  set  of  lower  level  behaviors,  and 
modeled  at  both  the  functional  and  structural  levels.  The  structural  description  is  useful 
for  seeing  the  architecture  of  a  cell,  while  the  functional  description  is  more  abstract. 
Functional  descriptions  may  provide  a  clear  picture  of  the  device  behavior,  but  their  pri¬ 
mary  purpose  is  to  aid  efficient  simulation.  Thus,  two  VHDL  architectural  bodies  were 
written  for  most  cells,  one  to  document  the  architecture,  and  the  other  to  describe  the 
function  and  be  used  to  drive  a  simulation.  The  MSFF  was  found  to  be  the  highest  level 
which  could  be  efficiently  modeled  using  this  approach,  modeling  the  functionality,  while 
preserving  some  structural  flavor  of  the  design.  Higher  level  cells,  modeled  at  the  func¬ 
tional  level,  would  instantiate  a  MSFF  as  part  of  the  overall  design. 

The  basic  concepts  behind  the  VHDL  were  found  to  be  relatively  simple.  The  svtax 
allows  the  VHDL  descriptions  to  be  written  which  are  clear  and  concise.  It  appears  to 
be  difficult  to  write  a  description  which  would  not  be  fairly  readable  and  understandable 
to  someone  with  a  basic  knowledge  of  the  language.  However,  this  absence  of  complexity 
leads  to  descriptions  which  are  tedious  to  write.  There  are  also  many  areas  of  the 


syntax  where  questions  arise  as  to  the  actual  meaning  or  implications  of  a  particular  con¬ 
struct,  This  is  not  unexpected,  it  arises  when  learning  any  new  computer  language,  but 
the  problem  is  exacerbated  due  to  the  youth  of  the  language,  lack  of  documentation  and 
examples. 

There  appear  to  be  two  aspects  of  VHDL,  design  documentation,  and  design  simu¬ 
lation.  Of  the  two,  only  documentation  is  fully  supported  by  the  language  at  this  point 
in  time.  There  also  appears  to  be  two  types  of  description  that  correspond  to  these 
aspects.  The  goals  of  modeling  and  simulating  a  circuit  using  VHDL  seem  to  be  facili¬ 
tated  by  separate  approaches.  A  purely  structural  description  will  not  simulate  as 
efficiently  as  will  one  specifically  written  for  that  purpose.  On  the  other  hand,  circuit 
descriptions  written  with  an  eye  towards  minimizing  simulation  time  will  not  be  as  clear 
in  describing  the  circuit  structure.  The  MIXED  type  of  architectural  description  allevi¬ 
ates  this  problem  somewhat,  but  the  actual  improvement  using  abstract  descriptions 
alone  is  not  known. 

0.3.  C  Simulation 

The  WFT16  processor  was  modeled  and  simulated  using  the  C  programming 
language.  Primitive  cell  structures  were  defined  to  model  each  of  the  primary  circuits  at 
the  bit  level.  These  primitive  cells  were  then  declared  and  interconnected  using  the  16- 
point  WFTA  as  a  netlist.  A  clock  was  defined  that  was  used  to  march  bit  streams 
through  the  this  cell  structure.  In  this  fashion,  the  WFT16  architecture  was  shown  to 
perform  the  16-point  DFT,  thereby  validating  the  architecture,  the  results  of  Taylor’s 
numerical  simulation  and  the  signal  to  noise  ratio  projections. 

6.4.  VHDL  Recommendations 

The  run  time  of  any  VHDL  driven  simulation  needs  to  be  quantized,  using  both 
functional  and  structural  descriptions.  The  improvement  in  run  time  for  such  tech¬ 
niques  as  limiting  event  scheduling  unless  the  input  output  transform  will  cause  an 


event  on  the  output,  and  removing  redundant  circuit  functions,  (such  as  the  feedback 
loop  of  a  static  latch)  should  be  studied.  If  the  difference  is  not  significant,  the  designer 
will  have  the  flexibility  to  run  a  more  structurally  flavored  circuit  models  (easier  to 
write)  in  the  simulation.  Finally,  the  descriptions  should  be  validated  to  ensure  that 
they  do  in  fact  perform  identical  functions  and  may  be  interchanged  at  will. 

At  this  point  in  VHDL’s  life  cycle  the  documentation  aspect  is  fully  supported  by 
the  syntax.  This  capability  should  be  used  to  document  the  cell  structure  and  other 
parameters  of  the  cells  developed  during  the  VLSI  courses.  These  include  the  name  of 
the  cell,  the  name  of  subcells,  and  design  information  which  would  be  useful  in  automat¬ 
ing  the  layout  process  at  some  point  in  the  future.  The  yearly  turnover  of  personnel  in 
the  AFIT  environment,  as  well  as  the  complexity  of  the  cells,  require  that  clear,  struc¬ 
tured  documentation  exist  to  aid  the  continuity  of  research  effort.  Thus  the  major 
recommendation  in  this  area  is  that  the  VHDL  should  be  used  to  document  the  CMOS 
cells  which  were  built  over  the  course  of  the  last  year,  and  in  future  years,  by  all  VLSI 
design  groups. 

0.5.  Simulation  Recommendations 

Although  the  simulation  was  designed  to  simulate  the  WFT16  processor,  the  design 
philosophy  is  applicable  to  the  other  processors  in  the  PFA  pipeline,  the  WFT15, 
WFT17  and  also  to  other  architectures  which  have  lock-stepped,  bit  serial  pipelines.  It 
models  the  hardware  at  the  bit  level,  and  has  the  primary  advantage  that  the  run  time 
of  the  simulation  is  very  short,  under  one  CPU  minute,  to  run  several  16-point  data  sets 
through  the  pipeline.  The  C  simulator  is  a  tool  which  should  grow  along  with  the 
research  in  pipelined  serial  signal  processing  architectures.  Any  design  which  uses  the 
cells  designed  in  the  VVFT16  effort  can  use  the  structures  and  concepts  of  the  simulator. 
At  this  time,  a  class  project  is  developing  a  program  which  will  layout  and  simulate  the 
multiplier  array  for  the  VVFT  processors.  Future  projects  in  this  area  could  include 
automating  the  layout  of  the  other  WFT  processors,  leading  to  computer  generated  sig- 


nal  processor  layout. 


8.0.  Conclusions 

The  WFT16  architecture  has  the  potential  of  an  order  of  magnitude  improvement 
in  processor  throughput  over  existing  designs.  Based  on  the  research  discussed  in  this 
report,  and  the  reports  of  the  other  members  of  the  design  team, [17],  [4],  [13],  there 
exists  a  high  degree  of  confidence  that  the  WFT16  processor  will  work,  as  expected,  on 
first  silicon. 


-79- 


Appendix  1  VHDL  Modeling 
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The  WFTA  processor  was  decomposed  into  its  primary  circuits  in  the  Chapter  4. 
The  very  lowest  level  of  decomposition  shows  that  all  the  circuit  elements  are  built  from 
transmission  gates  and  the  basic  logic  gates.  These  cell  are  linked  together  to  build 
latches,  logic  gates  and  flip  flops.  This  appendix  contains  the  VHDL  models  the  pri¬ 
mary  circuits.  The  devices  which  will  be  modeled  range  from  transmission  gates  to 
the  booths  multiplier  cells. 

1.1.  Transmission  Gate  This  is  a  behavioral  description  of  a  transmission  gate.  It 
actually  needs  both  senses  of  the  control  signal  to  drive  the  CMOS  ’P’  and  ’N’  transis¬ 
tors,  but  since  the  inverted  signal  does  not  perform  any  independent  function,  it  is  not 
included  in  the  port  list  or  architectural  description.  The  T_gate  is  sensitive  to  both  the 
input  signal  switching  and  also  transitions  on  the  control  line.  Therefore,  both  these  sig¬ 
nals  are  included  in  the  process  sensitivity  list.  The  T_GATE  is  sensitive  to  both  the 
input  and  control  signals.  However  if  control  =  ’0’  then  the  output  will  be  not  change 
regardless  of  the  value  of  the  input.  The  process  statement  reflects  this  consideration.  If 
the  control  has  not  just  changed  to  T’  the  output  will  not  reflect  the  input.  As  soon  as 
control  switches  to  T’  then  the  input  will  be  enabled.  As  long  as  the  control  remains 
high  the  output  will  reflect  the  input,  when  it  falls  the  input  signal  will  be  disabled  and 
not  be  allowed  to  cause  events  in  the  transaction  queue. 


..ft********************************************************************** 

DATE:  29  JULY  1985 

TITLE:  Transmission  Gate  Descriptions 

FDLENAME:t_gate.v 

LANGUAGE:  VHDL 

ENTITY: 

entity  T_gate 
(  bit_in:  in  Z_bit; 
control:  in  CONTROL; 
bit_out:  out  Z_bit; 

) 's 

end  T_gate, 

_*********************************************************************** 


architecture  BEHAVIOR_l  of  T_gate  is 


block 

begin 

process(bit_in.  control) 
begin 

if  (control  and  not  control’stable)  then 

enable  bit _ in; 

end  if: 

if  (not  control)  then 
disable  bit  Jn: 
end  if: 

bit_out  =  b i t, _ i n ; 
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end  process; 
end  block; 
end  BEHAV10R_1 


1.2.  ZJNVERTER 

The  ZJNVERTER  accepts  inputs  of  type  ZJ3IT  and  returns  type  bit.  O’  and  T’ 
inputs  are  negated  and  mapped  to  the  output  in  the  normal  fashion.  A  ’Z’  input  has 
no  effect  on  the  value  of  the  output,  the  output  will  retain  its  previous  value.  The  port 
mode  ’’buffer”  is  used  to  allow  assignment  of  the  output  to  itself  when  the  input  is 
’Z’.  This  device  is  used  in  the  latch  in  order  to  behaviorally  model  a  high  impedance 
input. 

„************** ********************************************************* 

DATE:  29  AUG  1985 

TITLE:  Z  INVERTER 
FILENAME:  z-invarch.v 
LANGUAGE:  VHDL 

ENTITY: 

entity  ZJNVERTER 
(bit_in  :  in  Z_BIT; 
bit_out:  buffer  BIT)  is 
end  ZJNVERTER; 

„***********************************************************************/ 

architecture  BEHAVIOR  of  ZJNVERTER  is 

block 

begin 

process  (bitjn) 
begin 

if(bitjn  =  ’Z’)  then 
bit_out  <=  bit_out; 
elsif  (bitjn  =  T)  then 
bit_out  <=  ’O’; 
else 

bit_out  <=  T’; 
end  if; 

end  process; 
end  block; 


end  BEHAVIOR; 


1.3.  Full  Adder-subr  actor 

This  section  contains  two  descriptions  of  the  fall  adder-subtractor  which  is  used  in 
the  pre-  and  post-add  arrays.  The  first  is  a  completely  structural  description  listing  all 
the  logic  gates  and  interconnections.  The  second  is  a  boolean  algebra  description  of  the 
functionality  of  the  circuit. 

__********************«************* ************************************* 

DATE:  29  AUG  1985 

TITLE:  LOGIC  LEVEL  DESCRIPTION  OF  AN  ADDER/SUBTRACTOR 
FILENAME:  add-logic .v 
PROJECT:  THESIS 
LANGUAGE:  VHDL 

ENTITY: 

entity  ADD_SUB2 
(A,  A_NOT,  B,  CY,  BY:  in  bit; 

SUM,  DIFF,  CY_OUT,  BY_OUT:  out  bit)  is 
end  ADDJ3UB2; 

FUNCTION: 

This  is  a  pure  structural  description  of  the 

ADDER/SUBTRACTOR  cell  used  in  the  PREADD  and  POSTADD  columns 
of  the  WFTA.  Because  of  the  cmos  transmission  gates  used, 
it  is  necessary  to  input  A  and  A_NOT. 


„_*****************************************************************•***** 
architecture  PUREJ5TRUCTURE  of  ADD _SUB_CELL 
PURE_STRUCTURE: 
block 

component  OR_GATE  port  (A,B:  in  bit;  C:  out  bit); 
component  AND.GATE  port  (A,  B:  in  bit;  C:  out  bit); 
component  XOR_GATE  port  (A,  B:  in  bit;  C:  out  bit); 
component  XNOR.GATE  port  (A,  B:  in  bit;  C:  out  bit); 
component  INVERTER  port  (A:  in  bit;  C:  out  bit); 


signal  S5,  S6,  Tl,  T2,  T3:  bit; 

signal  SI,  S2.  SUM,  CY_OUT,  DIFF.  BYjOUT:  atomic  WIRED_OR  bit: 
begin 

—  signals  S|,  and  S2  are  common  to  both  the  adder  and  subtractor 

(II  INVERTER  port(B.  B_NOT); 

CA1  AND.GATE  port  (A,  BJVOT,  S2); 

CA2:  AND _G ATE  port  (A_NOT.  B,  S2); 

CA3:  AND  .GATE  port  (A,  B.  SI); 
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CA4:  AND_GATE  port  (AJMOT,  BJVOT,  Si); 

—  the  gates  labeled  C..  make  up  the  XOR,  XNOR  functions 

--  given  by  Xl,  X2  below,  this  was  done  to  make  this  description 

—  compatible  with  the  actual  circuitry  implemented  in  CMOS 
-XI:  XNORjGATE  port  (A,B,S1);  -  Si  =  A  xor  B 

-X2:  XOR_GATE  port  (A,B,S2);  —  S2  =  A  xnor  B 

—  adder  section 

Al:  AND_GATE  port  (CY,  Si,  SUM);  -  CY  (A  xnor  B) 

II:  INVERTER  port  (CY,  Tl);  -  CY’ 

A2:  AND_GATE  port  (S2,  Tl,  SUM);  -  CY’  (A  xor  B) 

Ol:  OR_GATE  port  (A,  B,  S5);  -  (A  or  B) 

A3:  AND_GATE  port  (A,  B,  S6);  -  (A  and  B) 

A4:  AND_GATE  port  (S6,  Tl,CY_OUT);  -  CY’  (A  and  B) 

A5:  AND_GATE  port  (CY,  S5,CY_0UT);  -  CY  (A  or  B) 

—  subtractor  section 

A6:  ANDjGATE  port  (BY,  Si,  DIFF);  -  BY  (A  xnor  B)  =  DIFF 
12:  INVERTER  port  (BY,  T2);  -  BY’ 

A7 :  ANDjGATE  port  (S2,  T2,  DIFF);  -  BY’  (A  xnor  B)  =  DIFF 
13:  INVERTER  port  (A,  T3);  -  BY  (A  xnor  B)  =  BY_OUT 

A8:  AND.GATE  port  (BY,  Si,  BY_OUT);  -  BY  (A  xnor  B)  =  BY’ 
A9:  AND_GATE  port  (T3,  S2,  BY_OUT);  -  BY  (A  xnor  B)  =  BY_ 

end  block; 


end  PURE_STRUCTURE; 


--  This  is  a  boolean  algebra  description  of  the  adder/subtractor  cell 
architecture  LOGICJ3TRUCTURE  of  ADDJSUB_CELL  is 


block 

signal  Si,  S2:  bit  —temporary  signals  for  xor,  xnor  resources 
begin 

51  <  =  A  xnor  B; 

52  <  =  A  xor  B; 

—adder  section 

SUM  <=  (SI  AND  CYJN)  or  (S2  and  not  CYJN); 

CYJOUT  <=  ((A  or  B)  and  CYJN)  or  (A  and  B  and  not  CYJN)) 
—subtractor  section 

DIFF  <  =  (S2  and  not  BYJN)  or  (Si  and  BYJN); 

BY_OUT  <=  (Si  and  BYJN)  or  (S2  and  not  A); 
end  block; 


end  LOGIC_STRUCTURE; 


1.4.  Resettable  CMOS  Latch 

The  resettable  latch  is  used  as  the  front  end  of  the  resettable  MSFF.  The  reset 
signal  is  ”anded”  with  clock_not  to  avoid  fighting  at  the  input  node.  The  VHDL 
code  follows  the  normal  latch,  but  with  the  addition  of  the  reset  signal  to  the  interface 
declaration  and  the  process  statement.  The  reset  signal  is  meant  to  take  precedence 
over  the  input,  (a  direct  connection  to  ground  will  drain  any  charge  on  the  node)  for 
this  reason  it  is  listed  first  in  the  signal  assignment  statement. 


„********************************************«************************** 

DATE:  29  AUG  1985 

TITLE:  RESETABLE  CMOS  LATCH 
FILENAME:  rlatcharch.v 
LANGUAGE:  VHDL 

ENTITY: 

entity  RESET_LATCH 
(bit_in:  in  Z_BIT; 

CLK  :  in  clk_signal; 

CLKJVOT  :  in  clk_signal  :—  1; 
reset:  in  CONTROL; 
bit_out:  buffer  BIT  )  is 
end  RESET _L ATCH ; 

_ ***********************************************************************  / 

/ 

architecture  MDCED ^DESCRIPTION  of  RESET_LATCH  is 
block 

signal  t_gate_out;  :  Z_BIT; 
l_fdbk,  invert_out:  BIT; 

component  T_GATE  port  (a:  in  Z_BIT;  cntrl:  in  CONTROL;  x:  out  Z_BIT); 
component  ZJNVERTER  port(b:  in  Z_BIT;  y:  buffer  Z_BIT); 
component  INVERTER  port(c:  in  BIT;  z:  out  BIT); 

for  all:  T_GATE  use  —  This  is  a  mapping  between  the  ports  declared 

--  win  the  component  declarations  and  the 
--  formal  ports  listed  in  the  interface 
—  declaration. 

entity  (T_GATE) 

port  map(  BIT  JIN  =  •  a,  CONTROL  =  •  cntrl.  BIT_OUT  -  v) 
body  (BEIL-WIOR); 
end  for; 

for  all:  ZJNVERTER  use 
entity  (ZJNVERTER) 
port  map(  BITJN  =>  b.  BIT_OUT  =  -  v) 
body  (BEHAVIOR); 
end  for; 


for  all: INVERTER  use 


i 


entity  (INVERTER) 
body  ( <  <  library  >  > ) 
end  for; 


Tl:  TjGATE  port(IN,  CLK,  T_GATE_OUT) 

Zl:  Z-INVERTER  port(INVERTJN,  BIT_OUT); 

II:  INVERTER  port  (BIT_OUT,  INVERT_OUT); 

T2:  T_GATE  port  (INVERT_OUT,  CLK_NOT,  L_FDBK); 

process(  RESET,  T_GATE_OUT,  L_FDBK  ,CLK,CLK_NOT) 
if  (not  RESET ’stable  or  not  T_GATE_OUT’stable  or  not  L_FDBK ’stable) 
then 

enable  CLK,  CLK_NOT; 

else 

disable  CLK,  CLK_NOT; 
endif; 

if  ((reset  =  ’1’)  and  (CLKJMOT  =  ’1’))  then 
INVERTJN  <=  ’O’; 
elsif  (T_GATE_OUT  /=  ’Z’)  then 
INVERTJN  <=  TjGATEjOUT; 
else 

INVERT JN  <=  L_FDBK; 

end  if; 
end  process; 
end  block; 

end  MIXED JDESCRIPTION; 


1.5.  Set-Reset  Flip  Flop 

The  set  reset  flip  flop  (SRFF)  is  used  to  maintain  interval  signals  in  the  control 
sequencer  and  to  store  the  parity  error  flag  in  the  PC/ZF  column.  The  SRFF  is  com¬ 
posed  of  three  latches  and  some  CMOS  transistors.  This  cell  can  be  modeled  in  two 
ways,  by  instantiation,  and  process  statements.  The  process  statement  may  be  a  little 
more  unwieldly  but  it  should  execute  more  efficiently. 

_*«***************** ***************** ******* ******** ******************** 

DATE:  29  AUG  1985 

TITLE:  SET-RESET  ARCHITECTURE 
FILENAME:  srff-arch.v 
LANGUAGE:  VHDL 

ENTITY: 

entity  SRFF 
(  OPERATE:  in  BIT; 

SET,  RESET:  in  ZJBIT; 

CLK2,  CLKl  :  in  clk_gignal; 

CLK2_NOT,  CLKl_NOT  :  in  clk_signal  :=  1; 

SRjOUT:  buffer  BIT)  is 
assert  (not  (set  and  reset)) 

report  *  SET  AND  RESET  ARE  BOTH  HIGH  SIMULTAENOUSLY” 
severity  error; 
end  SRFF; 

FUNCTION: 


__*********************************************************************** 


/ 


architecture  BEHAVIOR  of  SRFF  is 
block 


signal  PASS_RESET,  SET_OUT,  RESET_OUT,  TO_OUT:  Z_BIT; 


begin 


L_S:  LATCH  port  (SET,CLK2,  CLK2_NOT,  SET.OUT); 

LJR:  LATCH  port  (PASS_RESET,  CLK2,  CLK2_NOT,  RESET_OUT); 
L_OUT:  LATCH  port  (TO_OUT,  CLKl.  CLKl_NOT,  SR_OUT); 

PASS_RESET  <=  RESET  when  OPERATE  =  ’1’ 
else  T; 


TO_OUT  <=  SET_OUT  when  SET  =  T 
else  RESET.OUT; 


er  d  block 
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architecture  BEHAVIOR  of  SRFF  is 


block 

component  LATCH  port(A:  in  ZJt>it;  CLK:  in  clk_signal;  X:  buffer  bit); 

for  all:  LATCH  use 
entity  (LATCH) 

port  map  (bit_in  =>  A,  CLK  =>  CLK,  CLK_NOT  =  >  open,  bit_out  =>  X) 
body  (STRUCTURE); 
end  for; 

signal  PASS_RESET,  SET_OUT,  RESET_OUT,  TO_OUT:  Z_BIT; 
begin 

--  If  either  the  set  or  reset  have  just  changed  the  process  will  be 
--  sensitive  to  clock  transitions. 

process(OPERATE,  SET,  RESET ,CLK2) 
begin 

if  (OPERATE  =  ’1’) 
enable  SET,  RESET; 

else 

disable  SET,  RESET; 
end  if; 

if  not(SET’STABLE  or  RESET’STABLE)  then 
enable  CLK2; 
end  if; 

SETjOUT  <=  SET; 

RESET_OUT  <=  RESET; 

end  process; 


process(SET_OUT,  CLKl) 
begin 

if  (not  SET_OUT’stable  or  not  RESET_OUT’stable)  then 
enable  CLKl; 

else 

disable  CLKl; 
end  if; 

if  SET_OUT  =  T’  then 
SRjOUT  <=  T 
elsif  RESET_OUT  =  T  then 
SR_OUT  <  =  ’0’ 

else 

SR_OUT  <  =  SR_OUT; 
end  if; 

end  process; 

end  block; 
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1.6.  Master  Slave  Flip/Flop 

The  MSFF  is  composed  of  two  latches  connected  by  the  signal  s2.  The  type 

conversion  function,  convz-b,  is  used  to  convert  the  input  signal  of  type  Z _ bit  to  type 

bit  which  phi_one  latch  expects.  This  will  be  a  common  problem  throughout  the  cell 
descriptions.  An  explicit  conversion  mechanism  is  used  to  confirm  that  the  design 
intent  was  to  connect  a  tristate  signal  to  the  input  of  a  gate.  Once  more  process 
scheduling  is  optimized  by  enabling  the  signal  s2  only  if  si  has  just  changed  state,  if 
not  the  signal  s2  will  remain  stable  and  will  not  fire  a  transaction. 

n********************************** ************ ************************* 

DATE:  29  AUG  1985 

TITLE:  MASTER  SLAVE  FLIP/FLOP  ARCHITECTURE. 

FILENAME:  msff-arch.v 
OPERATING  SYSTEM:  VMS 
LANGUAGE:  VHDL 

ENTITY: 

entity  MSFF 
(  bit_in:  in  Z_BIT; 

CLK2,  CLKl:  in  clk_jsignal; 

CLK2_NOT,  CLK1JMOT:  in  clk_signal  :=  1; 
bit_out:  out  BIT)  is 
end  MSFF; 

FUNCTION: 

this  is  a  description  of  a 

non-resettable  flip  flop.  The  signal  si  connects 
the  PHI  1  and  PHI  2  latches. 

__*«******«* ****************** ************************ *******************/ 
architecture  STRUCTURE  of  MSFF  is 


component  LATCH  port(A:  in  Z_bit;  CLK:  in  clk_signal;  X:  buffer  bit); 

for  all:  LATCH  use 
entity  (LATCH) 

port  map  (bit _ in  =>  A,  CLK  =>  CLK,  CLK_NOT  =>  open.  bit_out  =  >  X) 

body  (STRUCTURE); 
end  for; 

—  configuration  of  latch  using  a  block  configuration  statement 

signal  si:  BIT;  --  local  signal  within  the  MSFF 
s2:  ZJ3IT; 


Ll:  LATCH  port  (bit_in,  clk2,  si); 
L2:  LATCH  port  (s2,  clkl,  bit_out); 

process(sl,clkl,clkl_not) 


begin 

if  (not  si ’stable)  then 
enable  clkl; 

else 

disable  clkl; 
s2  <  =  convb-z(sl); 


end  if; 
end  process; 

end  block; 

end  PURE_STRUCTURE; 


—  the  output  of  the  phil  latch  must  be 

—  converted  to  type  Z_BIT  to  avoid  a  type 

—  clash. 
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_*************4^* ****************** ************************************* 


architecture  BEHAVIOR  of  MSFF  is 

—  This  is  an  alternate  modeling  of  the  MSFF. 

—  Note  the  simplicity  of  the  modeling,  it  will  show  the  same  behavior 

—  at  the  interface  ports  as  the  much  more  detailed  model  above. 

—  The  level  of  detail  required  in  the  simulation  determines  which 

—  VHDL  modeling  approach  should  be  taken,  simple  or  complex. 


block(  not  bit_in’stable) 


signal  si:  BIT;  —  local  signal  within  the  MSFF 


si  <=  memoried  bit_in  when  CLK2  =  1; 
bit_out  <=  si  when  CLKl  =  1; 


end  block; 


end  MIXED; 


rag 


PISO 

This  is  the  input  shift  register  cell  for  the  WFT  processors.  Data  enters  word 
parallel,  and  leaves  bit  serial.  Input  data  is  23  bits  numeric,  one  bit  parity.  Latch 
causes  the  input  to  be  moved  down  into  the  serial  path.  Shift  Right  moves  the  data  out 
serially.  Shift  down  moves  the  data  down  one  level  in  the  register. 


u************************************* ********************************** 
DATE:  29  AUG  1985 

—  TITLE:  Parallel  In,  Serial  Out  Shift  Register 
FILENAME:  piso 
LANGUAGE:  VHDL 

ENTITY: 

entity  PISO_CELL 
(PJN,  SJN:  in  ZJ3IT; 

CLK2,  CLKl  :  in  clk_signal; 

CLK2_NOT,  CLKl_NOT  :  in  clk_gignal  :=  1; 

P_SHIFT JDO WN ,  P_SHIFT JRIG HT ,  P_LATCH:  in  CONTROL; 
S_OUT:  buffer  BIT; 

P_OUT:  buffer  BIT)  is 

assert  (not  (P  J.ATCH  and  P_SHIFT_RIGHT)) 
report  "LATCHJPISO  AND  SHIFTJIIGHTJHSO  ARE  BOTH  HIGH” 
severity  warning; 

end  PISO_CELL; 

„***********************************************************************/ 


architecture  STRUCTURE  of  PISO_CELL  is 

~  this  is  a  purely  structural  description  of  the  PISO  cell. 

block 

component  MSFF  port(A:  in  Z_bit; 

CLK2,  CLK2_NOT,  CLKl,  CLKl_NOT:  in  elk  signal; 
B:  buffer  bit); 

component  T_GATE:  port  (X:  in  Z_bit; 

CLK:  in  clk_signal;  Y:  out  bit); 

for  all:  MSFF  use 
entity  (MSFF) 
port  map  (bit _ in  =>  A, 

CLK2  =>  CLK2,  CLK2JMOT  =>  CLI\2_NOT, 
CLKl  =>  CLKl,  CLKlJVOT  =>  CLKlJVOT, 
bit_out  =  >  B) 
body  (MIXED  J30DY); 
end  for; 


for  all:  T_GATE  use 


entity  (T_GATE) 
port  map  (bitJn  =>  convb-z(X), 
elk  =>  elk,  bit_out  =  >  Y) 
body  (BEHAVIOR); 
end  for; 


signal  PARALLELJN  :  Z_bit; 

signal  SERIAL  JN  :  atomic  LATCH_RESOLVE  Z3IT; 


begin 


Tl:  TjGATE  port(PJN,  PJSHIFTJiOWN,  PARALLELJN); 

Ml:  MSFF  port(PARALLELJN,  CLK2,  CLK2_NOT,  CLKl,  CLKl_NOT,  P_OUT); 
Tl:  T_GATE  port(SJN,  P_SHIFTJIIGHT,  SERIALJN); 

Tl:  T_GATE  port(P_OUT,  PJ.ATCH,  SERIALJN); 

M2:  MSFF  port(SERIAL JN,  CLK2,  CLK2J^0T,  CLKl,  CLK1J40T,  S_OUT); 
end  block; 


end  STRUCTURE; 


--  this  is  a  mixed  behavioral/structural  description  of  the  unit  piso  cell 
--  note  the  open  clk_not  ports. 

architecture  MIXED  of  PISO_CELL  is 


block 

component  MSFF  port(A:  in  Z_bit;  CLK2,  CLKl:  in  clk_signal; 
X:  buffer  bit); 


for  all:  MSFF  use 
entity  (MSFF) 

port  map  (bit_in  =>  A;  CLK2  =>  CLK2,  CLK2_NOT  =>  open 
CLKl  =>  CLKl,  CLK1_N0T  =>  open,  bit_out  =>  X) 
body  (BEHAVIOR); 
end  for; 

signal  PARALLELJN,  SERIALJN:  in  Z_BIT; 
begin 

P_FF:  MSFF  port  (PARALLELJN,  CLK2,  CLKl,  P_OUT); 

SJFT:  MSFF  port  (SERIALJN,  CLK2,  CLKl,  S_OUT); 

process(P_SHIFTJDOWN,  PJN) 
begin 

if  (P_SHIFT_DOWN  =  T)  then 
enable  PJN; 

else 

disable  PJN; 
end  if; 

if  P_SHIFTJ)OWN  =  ’1’  then 
PARALLELJN  <=  PJN; 
end  if: 

end  process: 

process)  P_SHIFT  JUGHT,  P_LATCH,  SJN,  P_OUT) 
begin 

if  ( PJSHIFT  JUGHT  =  T  or  P_LATCH  =  ’1!)  then 
enable  P_OUT.  S_OUT: 

else 

disable  P_OUT.  S_OUT; 
end  if: 

if  (P_SHIFT JUGHT  =  T)  then 
SERIALJN  <  =  SJN: 

elsif 

(P_LATC'H  =  T)  then 

SERIALJN  <  =  convb-z(P_OUT): 

else 

SERIALJN  =  Z'; 


1.8.  Adder-Subtractor 

This  is  the  structure  of  the  adder-subtractor  used  in  the  WFT  pre-adds  and  post¬ 
adds.  It  is  made  up  of  a  combination  logic  adder/subtractor,  CLKl  latches  for  the  input 
bits,  two  MSFF’s,  to  hold  the  CARRY  and  BORROW  for  the  next  data  bits,  and  CLK2 
latches  for  holding  the  result  on  the  way  to  the  next  stage.  The  carry  and  reset  msff 
latches  are  reset  on  the  leading  CLK2  latches.  It  should  be  noted  that  the  inputs  are 
inverted  and  inverted  again  at  the  outputs. 

__ * ** **** ** * ** * ***** * * *** * ** ** * * ** ** ** * * ** ****** ****************** ******* 

DATE:  29  AUG  1985 

TITLE:  ADDER/SUBTRACTOR  CELL 
FILENAME:  add.v 
PROJECT:  THESIS 
LANGUAGE:  VHDL 

ENTITY: 

with  WFTAJ? ACKAGE ;  use  WFTA_PACKAGE; 
entity  ADD_SUB_CELL 
(  BIT_X,  BIT_Y  :  in  bit; 

RST:  in  control; 

CLK2,  CLKl  :  in  clk_signal; 

CLK2JSIOT,  CLKl_NOT  :  in  clk_signal  :=  1; 

SUM,  DIFF:  buffer  bit)  is 
end  ADD_SUB_CELL ; 

FUNCTION: 


„************************ **** ******** **********  ft************************/ 

architecture  PURE_STRUCTURE  of  ADD_SUB_CELL  is 

PURE_STRUCTURE: 

block 

component  ADD-SUB  port(A,  A_NOT,  B,  CY_IN,  BY_JN:  in  bit; 

SUM,  DIFF,  CY_OUT,  BY_OUT:  out  bit; 

for  RA1:  ADD-SUM 
entity  (ADD-SUB) 

port  map  (A  =  >  A,  A_NOT  =  >  A_NOT,  B  =  >  B,  CYJN  =  >  CY. 
BYJN  =>  BRJN,  SUM  =>  SUM,  DIFF  =>  DIFF, 

CY_OUT  =  >  CY_OUT,  BY_OUT  =>  BR_OUT) 
body  (LOGIC_STRUCTURE); 
end  for; 

component  RMSFF  port(A:  in  Z_bit;  CLK2,  CLKl:  in  clk_signal: 

RST:  in  control,  X:  buffer  bit); 

for  all:  RMSFF  use 
entity  (RMSFF) 

port  map  (bitjn  — >  convb_z(A);  CLK2  =>  CLK2,  CLK2_NOT  =  >  open, 
CLKl  =>  CLKl,  CLK1JVOT  =>  open,  RSTJT  =>  RST, 


bit_out  =>  X) 
body  (BEHAVIOR); 
end  for; 


component  LATCH  port(A:  in  bit;  CLK,  CLK_NOT:  in  CLKJ3IGNAL; 

B:  buffer  bit); 

for  all:  LATCH 
entity  (LATCH) 

port  map  (BITJN  =>  A,  CLK  =>  CLK,  CLKJMOT  =  >  open,  BIT_OUT  =>  B) 
body  (BEHAVIOR); 
end  for; 

component  INVERTER  port(A:  in  bit;  B:  out  bit); 

for  all:  INVERTER 
entity  (INVERTER) 
port  map  (open) 
body  (<  <LIBRARY>  >); 
end  for; 

--  signals  for  real  section 

signal  SI,  SA,  SB,  S_NOTA,  S_CY,  S_BY,  S_SUM,  SJWF,  S_CY_OUT, 

S_BY_OUT:  bit. 


—  adder/subtractor 
RIl:  INVERTER  port(AJN,  SI); 

RLl:  LATCH  port  (AJN,  CLKl,  SA); 

RL2:  LATCH  port  (BJN,  CLKl,  SB); 

RL3:  LATCH  port  (convb-z(Sl),  CLKl,  S_NOTA); 

RAl:  AJDD_SUB  port(SA,  SJNOTA,  SB,  S_CY,  S_BY,  S_SUM,  S_DIFF, 
S_CY_OUT,  SJBY_OUT); 

RFFl:  R_MSFF  port(convb-z(S_CY_OUT),  CLK2,  CLKl,  RESET,  S_CY); 
RFF2:  RJVISFF  port(convb-z(SJBY_OUT),  CLK2,  CLKl,  RESET,  B_CY); 
RL4:  LATCH  port(convb-z(S_SUM),  CLK2,  SUM); 

RL5:  LATCH  port(convb-z(S_J>IFF),  CLK2,  DIFF); 

end  block; 

end  PURE_STRUCTURE: 
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1.9. 

Zero  Multiplier 

The  following  descriptions  structurally  describe  the  interconnections  of  the  Lyons  multi¬ 
pliers.  Each  description  is  essentially  identical,  which  of  outputs  of  the  data  Flip-Flops  is  used  as 
the  input  to  the  Adder  is  the  main  difference  The  primary  difference  between  the  cells  is  what 
MSFF  output  tap  the  input  to  the  adder  comes  from,  and  which  control  signals  are  used  as 
inputs. 

_**************** ****** ****** to* ***************************************** 

DATE:  29  AUG  1985 

TITLE:  Zero  Multiplier  for  the  WFT  processor 
FILENAME:  mO 
LANGUAGE:  VHDL 

ENTITY, 
entity  MULTjO 
(DATAJN:  in  BIT; 

PJPRODJN:  in  BIT; 

CLK2,  CLKl  :  in  clk_signal; 

CLK2JXOT,  CLKlJVOT  :  in  clk_signal  :=  1; 

SIGNJEXT:  in  M-CONTROL; 

P_PROD_OUT,  DATA_OUT:  buffer  BIT)  is 
end  MULTjO; 


FUNCTION:  This  cell  implements  the  0  case  for  the  modified 
Lyons  serial  multiplier  architecture. 

There  is  no  adder  cell  nor  carry  flip  flop 
used  in  this  circuit.  It  is  primarily  a  shift 
register. 

__**** ****************** to* *********************************************** 

architecture  STRUCTURE  of  MULT_0  is 
block 


component  MSFF  port(A:  in  Z_bit;  CLK2,  CLKl:  in  clk_signal; 
X:  buffer  bit); 


for  all:  MSFF  use 
entity  (MSFF) 

port  map  (bit _ in  =>  A;  CLK2  =>  CLK2,  CLK2JVOT  =>  open, 

CLKl  =>  CLKl,  CLKl_NOT  =>  open,  bit_out  =>  X) 
body  (BEHAVIOR); 
end  for; 

signal  FFOjDUT,  FFl_OUT,  FF2_OUT:  bit 
PRODJN:  Z_BIT; 


begin 

FFO:  MSFF  port(convb-z(DATAJN),  CLK2,  CLKl,  FFO_OUT); 
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FFl:  MSFF  port(convb-z(FFO_OUT),  CLK2,  CLKl,  FFl_OUT); 

FF2:  MSFF  port(convb-z(FFl_OUT),  CLK2,  CLKl,  DATA_OUT); 

FF _PROD:  MSFF  port(convb-z(PRODJN),  CLK2,  CLKl,  PJ>ROD_OUT); 

PROD  _JN  <=  convb-z(P  _PROD_IN)  when  SIGN_EXT  =  T 
else  ’Z’; 

end  block; 
end  STRUCTURE, 


1.10. 

Plus  One  Multiplier 

_  4^************** ****************************** ************************* 

DATE:  29  AUG  1985 

TITLE:  Plus  one  serial  multiplier 
FILENAME:  ml 
LANGUAGE:  VHDL 

ENTITY: 

entity  MULTpl 
(DATAJN:  in  BIT; 

P_PROD_JN:  in  BIT; 

CLK2,  CLKl  :  in  clk_signal; 

CLK2_N0T,  CLKl_NOT  :  in  clk_gignal  -  1; 

RESET _0,  SIGN_EXT:  in  M-CONTROL; 

P_PROD_OUT,  DATA-OUT :  buffer  BIT)  is 
end  MULTpl; 

—  FUNCTION:  Plus  one  serial  multiplier  for  the  WFT 
This  cell  needs  the  reset  to  0  and  the 
sign  extend  control  signals.  The  input  to] 
the  adder  comes  from  the  output  of  the  second 
MSFF  in  the  data  chain. 

__*********************************************************************** 


architecture  STRUCTURE  of  MULTpl  is 
block 

component  ADD-SUB  port(A,  A_NOT,  B,  CY_IN,  BY.JN:  in  bit; 

SUM,  DIFF,  CY_0UT,  BY_OUT:  out  bit); 

for  RAl:  ADD-SUB  use 
entity  (ADD-SUB) 

port  map  (A  =>  A,  A_NOT  =>  A_NOT,  B  =  >  B,  CYJN  =>  CY, 
BY_JN  =>  BRJN,  SUM  =>  SUM,  DIFF  =>  DIFF, 

CY_OUT  =>  CY_OUT,  BY_OUT  =  >  BR_OUT) 
body  (LOGIC_STRUCTURE); 
end  for; 

component  RMSFF  port(A:  in  Z_bit;  CLK2,  CLKl:  in  clk_signal: 

RST:  in  control,  X:  buffer  bit); 

for  all:  RMSFF  use 
entity  (RMSFF) 

port  map  (bitjn  — >  convb_z(A);  CLK2  =>  CLK2,  CLIv2_NOT  =>  open, 
CLKl  =>  CLKl,  CLK1_N0T  =>  open,  RST_FF  =>  RST. 
bit_out  =  >  X) 
body  (BEHAVIOR); 
end  for; 
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component  MSFF  port(A:  in  ZJbit;  CLK2,  CLKl:  in  clk_signal; 
X:  buffer  bit); 


for  all:  MSFF  use 
entity  (MSFF) 

port  map  (bit_jn  =>  A;  CLK2  =>  CLK2,  CLK2_NOT  =>  open, 

CLKl  =>  CLKl,  CLKl_NOT  =>  open,  bit_out  =>  X) 
body  (BEHAVIOR); 
end  for; 

signal  FFO.OUT,  FFl.OUT,  FF2_OUT:  bit 
PRODJN:  ZJBIT; 

begin 

DPO:  MSFF  port(convb-z(DATAJN),  CLK2,  CLKl,  FFO_OUT); 

DPI:  MSFF  port(convb-z(FFO_OUT),  CLK2,  CLKl,  FFl_OUT); 

DP2:  MSFF  port(convb-z(FFl_OUT),  CLK2,  CLKl,  DATA_OUT); 

Al:  ADDER  port(FFl_OUT,  CYJN,  P_PROD_IN,  CARRY,  SUM); 

FF_PROD:  MSFF  port(convb-z(SUMJN),  CLK2,  CLKl,  P_PROD_OUT); 

FF_CARRY:  R_MSFF  port(convb-z(CARRY),  CLK2,  CLKl,  RESET J3,  CYJN); 

SUMJN  <=  convb-z(SUM)  when  SIGN_EXT  -  ’1’ 
else  ’Z’; 


end  block; 
end  STRUCTURE; 
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1.11. 

Plus  Two  Multiplier 

__*********************************************************************** 

DATE:  29  AUG  1985 

TITLE:  plus  two  multiplier 
FILENAME:m2 
LANGUAGE:  VHDL 

ENTITY: 

entity  MULTp2 
(DATAJN:  in  BIT; 

P_PROD_IN:  in  BIT; 

CLK2,  CLKl  :  in  clk_signal; 

CLK2_NOT,  CLKl_NOT  :  in  clk_gignal  :=  1; 

RESETJ),  RSTDC,  SIGN_EXT:  in  M-CONTROL; 

P_PROD_OUT,  DATA-OUT:  buffer  BIT)  is 
end  MULTp2; 

FUNCTION:  implements  the  +2  multiplier  for  the  WFT 
This  cell  takes  the  input  to  the  adder  from 
the  output  of  the  last  flip  flop  in  the  chain. 

—  it  also  requires  the  signal  rstdc  which  is 

”  anded”  with  the  data  path  input. 

_ ***********************l*l*************3*ll***4t***!|<*************************  j 

architecture  STRUCTURE  of  MULTp2  is 
block 

component  ADD-SUB  port(A,  A-NOT,  B,  CYJN,  BYJN:  in  bit; 

SUM,  DIFF,  CY_OUT,  BY_OUT:  out  bit; 

for  RAl:  ADD-SUB  use 
entity  (ADD-SUB) 

port  map  (A  =>  A,  A_NOT  =>  AJNOT,  B  =  >  B,  CYJN  =>  CY, 
BYJN  =>  BRJN,  SUM  =>  SUM,  DIFF  =>  DIFF. 

CY_OUT  =>  CY_OUT,  BY_OUT  =>  BR_OUT) 
body  (LOGIC -STRUCTURE); 
end  for; 

component  RMSFF  port(A:  in  Z_bit;  CLK2,  CLKl:  in  clk_signal; 

RST:  in  control.  X:  buffer  bit); 

for  all:  RMSFF  use 
entity  (RMSFF) 

port  map  (bitjn  =>  convb_2(A);  CLK2  =>  CLK2,  CLK2_NOT  =>  open, 
CLKl  =>  CLKl,  CLKl_NOT  =>  open,  RSTJ'F  =>  RST, 
bit_out  =  >  X) 
body  (BEHAVIOR); 
end  for; 


component  MSFF  port(A:  in  Z_bit;  CLK2,  CLKl:  in  clk_gignal; 

X:  buffer  bit); 

for  all:  MSFF  use 
entity  (MSFF) 

port  map  (bit_in  =>  A;  CLK2  =>  CLK2,  CLK2_NOT  =>  open, 
CLKl  =>  CLKl,  CLK1JM0T  =>  open,  bit_out  =>  X) 
body  (BEHAVIOR); 
end  for; 


signal  FFOjOUT,  FFl_OUT,  FF2_OUT:  bit 
D  AT  A_P  ATHJN ,  SUM_IN:  BIT, 


DPO:  MSFF  port(convb-z(DATAJN),  CLK2,  CLKl,  FFO_OUT); 

DPI:  MSFF  port(convb-z(FFO_OUT),  CLK2,  CLKl,  FFl_OUT); 

DP2:  MSFF  port(convb-z(FFl_OUT),  CLK2,  CLKl,  DATA_OUT); 

Al:  ADDER  port(D ATAJP ATH_IN ,  CYJN,  P_PRODJ[N,  CARRY,  SUM); 
FF_PROD:  MSFF  port(convb-z(SUM_IN),  CLK2,  CLKl,  P_PROD_OUT); 
FF_CARRY:  R_MSFF  port(convb-z(CARRY),  CLK2,  CLKl,  RESET_0,  CYJN); 


SUMJN  <=  convb-z(SUM)  when  SIGNJEXT  =  T 
else  Z’; 

DATAPATH JN  <=  (RSTDC  and  FF2.0UT); 
end  block; 
end  STRUCTURE; 


f 


1.12. 

Negative  One  Multiplier 

__****************************«****************************************** 

DATE:  29  AUG  1985 

TITLE:  Negative  one  multiplier  cell 
FILENAME:  multnl.v 
LANGUAGE:  VHDL 

ENTITY: 

entity  MULTnl 
(  DATAJN:  in  BIT; 

PJPRODJN:  in  BIT; 

CLK2,  CLKl  :  in  clk_signal; 

CLK2_NOT,  CLKl_NOT  :  in  clk^ignal  :=  0; 

RESET_1,  SIGNJEXT:  in  M-CONTROL; 

PJ>ROD_OUT,  DATA_OUT:  buffer  BIT)  is 
end  MULTnl; 

FUNCTION:  This  is  the  negative  1  multiplier  cell.  The  carry  in 
the  negative  cells  are  reset  to  one  instead  of  zero 
as  in  the  positive  case. 

__**************** ************ ****************************************** 
architecture  STRUCTURE  of  MULTnl  is 


component  ADD-SUB  port(A,  A_NOT,  B,  CYJN,  BYJN:  in  bit; 

SUM,  DIFF,  CY_OUT,  BY_OUT:  out  bit; 

for  RA1:  ADD-SUB  use 
entity  (ADD-SUB) 

port  map  (A  =>  A,  A_NOT  =>  A_NOT,  B  =  >  B,  CYJN  =>  CY, 
BYJN  =>  BRJN,  SUM  =>  SUM,  DIFF  =>  DIFF, 
CY_OUT  =>  CY_OUT,  BY_OUT  =>  BR_OUT) 
body  (LOGICJSTRUCTURE); 
end  for; 

component  MSFF  port(A:  in  ZJ>it;  CLK2,  CLKl:  in  clk_signal; 

X:  buffer  bit); 

for  all:  MSFF  use 
entity  (MSFF) 

port  map  (bitjn  =>  A;  CLK2  =>  CLK2,  CLIv2_NOT  =>  open. 

CLKl  =>  CLKl,  CLKlJMOT  =>  open,  bit_out  =>  X) 
body  (BEHAVIOR); 
end  for; 


component  RHMSFF  port(A:  in  Z_bit;  CLK2,  CLKl:  in  clk^signal; 

RSTj:  in  M-CONTROL;  X:  buffer  bit); 


SPEKES 


for  all:  RHMSFF  use 
entity  (RHMSFF) 

port  map  (bit_in  =>  A;  CLK2  =  >  CLK2,  CLK2_N0T  =>  open, 
CLKl  =>  CLKl,  CLK1_N0T  =>  open, 

RST_1  =>  RST_1,  bit_out  =>  X) 
body  (BEHAVIOR); 
end  for; 

signal  FFOjOUT,  FFl_OUT,  FF2_OUT:  bit 
ADDJN,  SUM:  bit; 


DPO:  MSFF  port(convb-z(DATAJN),  CLK2,  CLK2_NOT,  CLKl, 

CLKl_NOT,  FFO.OUT); 

DPI:  MSFF  port(convb-z(FFO_OUT),  CLK2,  CLK2_NOT,  CLKl, 

CLKl_NOT,  FFl_OUT); 

DP2:  MSFF  port(convb-z(FFl_OUT),  CLK2,  CLK2_NOT,  CLKl, 

CLKl_NOT,  DATA_OUT); 

Al:  ADDER  port(ADD_IN,  CYJN,  P_PROD_tN,  CARRY,  SUM); 

FFJPROD.  MSFF  port(convb-z(SUMJN),  CLK2,  CLK2_NOT,  CLKl, 

CLKl_NOT,  PJPROD_OUT); 

FF_CARRY :  R_MSFF  port(  convb-z( CARRY),  CLK2,  CLK2JIOT,  CLKl,  CLKl_NOT, 
RESET  J,  CYJN); 

SUMJN  <=  convb-z(SUM)  when  SIGN_EXT  =  ’1’ 
else  ’Z’; 

ADDJN  <=  not(FFl_OUT); 
end  block; 
end  STRUCTURE; 


Negative  Two  Multiplier 

_*********************************************************************** 

DATE:  29  AUG  1985 

TITLE:  Negative  two  multiplier 
FILENAME:  multn2.v 
LANGUAGE:  VHDL 

ENTITY: 

entity  MULTn2 
(  DATA-IN:  in  BIT; 

P_PROD_IN:  in  BIT; 

CLK2,  CLKl  :  in  clk_signal; 

CLK2_NOT,  CLKl_NOT  :  in  clk^ignal  :=  0; 

RESET_1,  RSTDC,  SIGN_EXT:  in  M-CONTROL; 

PJPROD_OUT,  DATAjOUT:  buffer  BIT)  is 
end  MULTn2; 

__*********************************************************************** 


architecture  STRUCTURE  of  MULTn2  is 


component  ADD-SUB  port(A,  A_NOT,  B,  CY_JN,  BY.JN:  in  bit; 

SUM,  DIFF,  CY_OUT,  BY_OUT:  out  bit; 

for  RA1:  ADD-SUB  use 
entity  (ADD-SUB) 

port  map  (A  =  >  A,  AJSIOT  =>  AJVOT,  B  =  >  B,  CYJN  =  >  CY, 
BY_IN  =>  BR_JN,  SUM  =>  SUM,  DIFF  =>  DIFF, 
CY_OUT  =>  CY_OUT,  BY_OUT  =>  BR_OUT) 
body  (LOGIC_STRUCTURE); 
end  for; 

component  MSFF  port(A:  in  Z_bit;  CLK2,  CLKl:  in  clk_signal; 

X:  buffer  bit); 

for  all:  MSFF  use 
entity  (MSFF) 

port  map  (bit_jn  =  >  A;  CLK2  =>  CLK2,  CLK2JNOT  =>  open, 
CLKl  =>  CLKl,  CLKl_NOT  =>  open,  bit_out  =>  X) 
body  (BEHAVIOR); 
end  for: 

component  RHMSFF  port(A:  in  Z_bit;  CLK2.  CLKl:  in  clk_signal; 

RST_1:  in  M-CONTROL;  X:  buffer  bit); 

for  all:  RHMSFF  use 
entity  (RHMSFF) 

port  map  (bit _ in  =>  A;  CLK2  =>  CLK2.  CLK2_NOT  =>  open. 


CLKl  =>  CLKl,  CLK1_N0T  =>  open, 

RST_1  =  >  RST_1,  bit_out  =>  X) 
body  (BEHAVIOR); 
end  for; 

signal  FFO_OUT,  FFl_OUT,  FF2_OUT:  bit 
DATAJ>ATHJN,  SUMJN:  BIT; 

begin 

DPO:  MSFF  port(convb-z(DATAJN),  CLK2,  CLKl,  FFO_OUT); 

DPI:  MSFF  port(convb-z(FFO_OUT),  CLK2,  CLKl,  FFl_OUT); 

DP2:  MSFF  port(convb-z(FFl_OUT),  CLK2,  CLKl,  DATA_OUT); 

Al:  ADDER  port(D ATA_P ATH_IN ,  CYJN,  PJPRODJN,  CARRY,  SUM); 

FFJPROD:  MSFF  port(convb-z(SUM_IN),  CLK2,  CLKl,  PJ=>ROD_OUT); 

FF_C ARRY :  R_MSFF  port(convb-z(CARRY),  CLK2,  CLKl, 

RESETJ),  CY_IN); 

SUMJN  <==  convb-z(SUM)  when  SIGN_EXT  =  ’U 
else  ’Z’; 

DAT^ATHJN  <=  (RSTDC  nand  FF2_OUT); 
end  block; 


end  STRUCTURE: 


1.14. 

Parity  Round  Cell 

_ *********************************************************************** 

DATE:  29  AUG  1985 

TITLE:  PARITY  ROUND  CELL 
FILENAME:  prcell.v 
LANGUAGE:  VHDL 

ENTITY: 

entity  PRCELL 
(  PRJN:  in  bit; 

P_CALC,  R_CALC,  P  JPPEND:  in  CONTROL; 

CLK2,  CLKl  :  in  clk_gignal; 

CLK2JMOT,  CLKl_NOT  :  in  clk_signal  :=  1; 

PR_OUT:  buffer  bit)  is 
end  PRCELL; 

FUNCTION:  THIS  CELL  COMPUTES  THE  PARITY  BIT  .AND  ROUNDS  THE  RESUTL 
OUT  OF  THE  POST-ADDERS. 

***********************************************************************  j 

architecture  mixed  of  prcell  is 
block 

component  LATCH  port(A:  in  Z_bit;  CLK:  in  clkjgnal; 

X:  buffer  bit); 

for  all:  LATCH  use 
entity  (LATCH) 

port  map  (bit Jin  =>  A,  CLK  =>  CLK,  CLK_NOT  =>  open,  bit_out  =>  X) 
body  (STRUCTURE); 
end  for; 

component  MSFF  port(A:  in  Z _ bit;  CLK2,  CLKl:  in  clk_signal: 

X:  buffer  bit); 

for  all:  MSFF  use 
entity  (MSFF) 

port  map  (bit_in  =  >  A:  CLK2  =>  CLK2.  CLK2_NOT  =  '>  open. 

CLKl  =  >  CLKl,  CLKl_NOT  =  •  open.  bit_out  =  ■  X) 
body  (BEHAVIOR): 
end  for: 

signal  ROUND  JND.  ROUND_OR.  ROUND_OUT.  INJXOR.  PARITY _XOR. 

PARITY' _OR,  PARlTY_OUT:  bit, 

begin 

-  ROUNDING  SECTION 


LJN  LATCH  port(  PRJN.  CLKl,  BITJN); 
INJXOR  •  =  (BITJN  xor  ROUND_OUT): 


ROUND_j\ND  <=  (BITJN  and  ROUND_OUT); 

ROUND_OR  <=  (ROUND.AND  or  R_CALC); 

RND31SFF:  MSFF  port(ROUND_OR,  CLK2,  CLKl,  ROUND_OUT); 

-  PARITY  SECTION 

PARITY_XOR  <=  (INJCOR  or  PARITY_OUT); 

PARITYjOR  <=  (PARITY JCOR  or  P_CALC); 

PAF_MSFF :  MSFF  port(PARITY_OR,  CLK2,  CLKl,  PARITY_OUT); 
TO_OUT  <=  IN3COR  when  P3PPEND  =  T 
else  PARITY_OUT ; 

L_OUT:  LATCH  port(TO_OUT,  BITJPR_OUT); 
end  block; 


end  MIXED; 


Appendix  2 

C  Simulation  Programs 


This  appendix  contains  the  programs  used  to  simulate  the  WFT16  processor.  In 
addition,  the  binary-decimal  and  decimal-binary  conversion  programs  are  also  included. 
The  programs  are  listed  in  the  order  encountered  in  the  pipeline:  Control  Sequencer, 
Column  Controller,  Pre_WFTA.c,  Multiplier,  Post  Adders,  and  the  conversion  programs 
bin.c  and  form_16.c. 


1.1.  CS.c 


** 

** 

** 

** 

** 

** 

** 

** 

** 

** 

** 

** 

** 

** 


************************************  ** ********************** ********* 
DATE:  15  AUG  1985 

TITLE:  Control  Sequencer  Simulation  Program 

FILENAME:  cs.c 
COORDINATOR:  Jim  Collins. 

PROJECT:  THESIS 

FUNCTION:  simulates  the  control  sequencer  for  the  wfta. 

Requires  the  number  of  control  cycles  to 
generate  control  signals  for,  and  the  scale  factor 
of  the  input  data. 

FILES  WRITTEN:  master_control:  contains  a  time  tagged  control 
word  for  the  wfta  processor. 

FILES  INCLUDED:  sr.c:  A  function  which  is  used  to  evaluate 
the  set  reset  (SRFF)  behavior. 


*************************************$********************************/ 

/ 


#include  <stdio.h> 

#define  clk_cycle  32  /*  number  of  cycles  in  the  counter*/ 

#include  "sr.c” 
main  () 

{ 

FILE  *fp,  *gp,  *hp,  *ip,  *fopen(); 
int  clk_count  =  0; 
int  elk; 
int  i; 

int  setpass  =  0; 
int  rstpass; 
int  rszf; 
int  tmp_pass; 
int  tmp_zfill; 
int  tmp_piso; 
int  scale  =  0; 


int  tmp_rcal; 
int  cycles; 

/*  flag  register  which  holds  the  control  signals  before  they  are 
written  to  the  file  */ 

struct 

{ 

unsigned  prejbar  :  1; 
unsigned  inc  :  1; 
unsigned  load_rom  :  1; 
unsigned  par_rst  :  1; 
unsigned  r_calc  :  1; 
unsigned  p_calc  :  X; 
unsigned  p_append  :  1; 
unsigned  l_sipo  :  1; 
unsigned  sr_sipo  .  1; 
unsigned  sr_piso  :  1; 
unsigned  l_piso  :  1; 
unsigned  sd_sipo  :  1; 
unsigned  sd_piso  :  1; 
unsigned  mult_round  :  1; 
unsigned  zero_fill  :  1; 
unsigned  pass_out  :  1; 
unsigned  rst_add  :  1; 
unsigned  par_chk  :  1; 
unsigned  in_out:  1; 
unsigned  up_in:  1; 

}  flags; 

master  slave  flip  flop  structure,  (MSFF),  there  are  thirty  two  MSFFs 
in  the  ring  counter  */ 

struct  msff 

{ 

int  clk2; 
int  clkl; 

}  ff(32j,  delayst,  delayrst; 

struct  srff  /*set  reset  data  structure*/ 

{ 

int  set; 
int  reset; 
int  out; 

}  shift_piso,  zfill,  pass; 

hp  =  fopen(” master_control” ,  ”w”);  /‘open  the  control  file  */ 

shift_piso.out  =  1; 
pass. out  =1; 
delayst  . elk  I  =  I: 
delayst. clk2  =  1; 
delayrst.dk  1  =  1; 


delayrst.clk2  =  1; 
zfill.out  =  1; 


/*prompt  for  the  number  of  clock  cycles  and  the  scale  factor 
of  the  input  data  set  */ 

printf(”  HOW  MANY  CLOCK  CYCLES  DO  YOU  WANT  TO  SIMULATE?!)); 
scanf(”%dn,  &cycles); 

again: 

printf(”  WHAT  IS  THE  SCALE  FACTOR  FOR  THE  INPUT  DATA?  0); 
printf(”  THE  SCALE  MUST  BE  BETWEEN  0  AND  70); 
scanf(”%d”  ,&scale); 

if  ((scale  >  7)  |  (scale  <  0)) 

{ 

printf(  "SCALE  FACTOR  IS  NOT  WITHIN  RANGE,  TRY  AGAINO); 
goto  again; 

} 

printf(” COMPUTING  CONTROL  SIGNALS  FOR  %d  CLOCK  CYCLESO, cycles); 
printf("  THE  SCALE  FACTOR  IS  %d0, scale); 

fprintf(hp,”%dO,  cycles); 

/***********************************«******«*******«**********  *******  j 

while  (clk_count  <==cycles) 

{ 

elk  =  clk_count  %  clk_cycle;  /*  modulo  32  counter  */ 

/*  initialize  the  ring  counter  to  simulate  a  bit  entering 

on  clock  cycle  0  */ 

if  (elk  ==  0) 
ff[0j.clk2  =  1; 

if  (elk  ==  I) 
ff(0|.clk2  =  0; 

/*  things  which  happen  on  clock  2*/ 

for  (i  =  I;  i <  =31 ;  i++) 
ff(ij.clk2  =  ff[i-lj.clkl; 

delayrst.clk2  =  rstpass; 
delayst.clk2  =  setpass; 

*  things  that  happen  on  clock  1  */ 

for  (i  =  0;  i<=3l;  i+-(-) 
fffij.clkl  =  ff(ij.clk2; 

delayst.clkl  =  delayst.clk2; 
delayrst.dk  1  =  delayrst.clk2; 
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/*  assignment  of  control  signals:  adaptive  scaling  algorithm  uses 
the  if  -  then  construct  to  model  the  PL  A  and  SRFF  behavior  */ 

if  ((scale  — =  0)  &&  (elk  ==  6)) 
setpass  =  1; 

else  if  ((scale  ==  1)  &&  (elk  ==  7)) 
setpass  =  1; 

else  if  ((scale  ==  2)  &&  (elk  —  8)) 
setpass  —  1; 

else  if  ((scale  ==  3)  &&  (elk  ==  9)) 
setpass  =  1; 

else  if  ((scale  ==  4)  &&  (elk  ==  10)) 
setpass  =  1; 

else  if  ((scale  -  —  5)  &&  (elk  ---  11)) 
setpass  —  1; 

else  if  ((scale  ==  6)  &&  (elk  ==  12)) 
setpass  =  1; 

else  if  ((scale  ==  7)  &&  (elk  ==  11)) 
setpass  —  1; 
else 

setpass  =  0; 

if  ((scale  ==  0)  &&  (elk  ==  29)) 
rstpass  =  1; 

else  if  ((scale  ==  1)  &&  (elk  ==  29)) 
rstpass  =  1; 

else  if  ((scale  ===  2)  &&  (elk  ==  29)) 
rstpass  —  1; 

else  if  ((scale  — —  3)  &&  (elk  ==  30)) 
rstpass  =  1; 

else  if  ((scale  ==  4)  &&  (elk  ==  31)) 
rstpass  =  1; 

else  if  ((scale  ==  5)  &&  (elk  ==  0)) 
rstpass  =  1; 

else  if  ((scale  ==  6)  &&  (elk  ==  1)) 
rstpass  =  1; 

else  if  ((scale  ==  7)  &&  (elk  ==  1)) 
rstpass  =  1; 
else 

rstpass  =  0; 

!*  call  the  set_reset  function  to  evaluate  any  possible  changes  in 
the  set  and  reset  variables  */ 

tmp_piso  =  set_reset(setpass,  rstpass,  shift_piso.out); 
shift_piso.out  =  tmp_piso; 
flags.sr_piso  =  shift_piso.out; 

if  ((scale  ==  0)  &.&.  (elk  ==  6)) 
rszf  =  1; 

else  if  ((scale  ==  1)  &.&.  (elk  ==  7)) 
rszf  =  l; 

else  if  ((scale  ==  2)  &.&.  (elk  ==  8)) 
rszf  =  1; 

else  if  ((scale  ==  3)  &&  (elk  ==  9)) 


else  if  ((scale  ==  4)  &.&.  (elk  ==  10)) 
rszf  =  1; 

else  if  ((scale  ===  5)  Sc&  (elk  ==  10)) 
rszf  =  1; 

else  if  ((scale  ==  6)  &&  (elk  -==  10)) 
rszf  =  1; 

else  if  ((scale  ==  7)  &&  (elk  ==  10)) 
rszf  =  1; 
else 

rszf  =  0; 

tmp_zfill  =  set_jreset(ff[l].clkl,  rszf,  zfill.out); 
zfill.out  =  tmp_zfill; 
flags.zero_fill  =  zfill.out; 

tmp_pass  —  set_jreset(setpass,  rstpass,  pass. out); 
pass. out  =  tmp_pass; 
flags. pass_out  =  pass. out; 

/*  sd_sipo  and  sd_piso  both  happen  on  alternatating  clock  cycles  */ 

if  (clk%2  ==  0) 
flags. sd_jsipo  =  1; 
else 

flags.sd_sipo  =  0; 

if  (clk%2  ==  1) 
flags.sd_piso  —  1; 
else 

flags. sd_piso  =  0; 

*  interval  signals  */ 

if  ((elk  <  19)  ||  (elk  >=  28)) 
flags. p_calc  =  1; 
else 

flags.  p_calc  =  0; 

if  ((elk  <  19)  ||  (elk  >=  27)) 
flags. r_calc  =  1; 
else 

flags. r_calc  =  0; 

if  ((elk  <■  21)  ||  (elk  >=  29)) 
flags. sr_gipo  =  1; 
else 

flags.  sr_sipo  =  0; 

*  pulse  signals  * 

flags. l_piso  =  ff'OLclkl; 
flags. l_sipo  =  fT[2 1 1  .elk  1 ; 
flags. par_chk  =  flags.sr_piso; 
flags.  par_rst  =  flags. l_piso; 
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flags.  mult_round  =  ff(Oj.clkl; 
flags.rst_add  =  !ff[l].clk2; 
flags. p_append  =  ff[19].clkl; 

*  print  results  to  the  file  master_control  */ 

fprintf(hp,  ”%d0,  clk_count); 

fprintf(hp,,’%d  %d  %d  %d  %d  %d  %d  %d  %d  %d  %d  %d  %d  %d  %d  %d  %d  %d  %d  %d0 

flags.pass_out,  flags. zero_fill,  flags. par_chk,  flags. par_rst, 

flags.r_calc,  flags.p_calc,  flags. p_append,  flags. mult_round, 

flags.inc,  flags.sr_piso,  flags.  Lpiso,  flags. sd_piso, 

flags.sd_sipo,  flags. Lsipo,  flags.sr_sipo,  flags. pre_bar, 

flags. rst_add,  flags. load_rom,  flags. in_out,  flags. up_in); 

clk_count  +==  1; 

}  /*  end  while  */ 

/*  end  main  */ 


.  -  .  ■r.  •  .  *\  «  .  i 


.********************************************************************* 

**  DATE:  29  AUG  1985 

** 

**  TITLE:  Column  controller  simulation  program. 

**  FILENAME:  c_cntrl.c 

**  COORDINATOR:  Jim  Collins. 

**  PROJECT:  WFT16  SIMULATION 

**  USE:  This  program  generates  the  control  files  for  the  arithmetic 

**  pipeline.  It  recieves  input  from  the  file  MASTER_CONTROL, 

**  and  outputs  three  files,  preadd_cntrl,  mult_contrl, 

**  and  postadd_cntrl. 

** 

***************»******************************************************/ 

^include  <stdio.h> 

#define  clk_cycle  32 
main  () 

{ 

FILE  *fp,  *hp,  *ip,  *jp,  *fopen(); 

int  i,  int  elk,  clk_count,  clk_int  =  0,  cycles,  rst_add; 

int  c_word[20]; 

/*  this  is  the  structure  which  holds  the  control  signals  for  all 
fourteen  columns  of  the  multiplier  array  */ 

struct 

{ 

unsigned  reset _0  :  1; 
unsigned  reset_l  :  1; 
unsigned  rstdc  :  1; 
unsigned  s_extend  :  1; 

}  flags[l4j; 

struct  msff  /’"master-slave  flip  flop  data  structure  */ 

{ 

int  clk2; 
int  elk  1 ; 

} ; 

struct  msff  tmp,  preadd_cntrl[4],  mult_cntr![42],  postadd_cntrl[3|; 

*  the  control  pipeline  is  initially  set  to  all  ones,  signals  switch 
in  response  to  a  zero  traveling  throught  the  pipe,  which 
happens  every  thirty-two  clock-cycles.  */ 

for  (  i  =  41:  i  >=  0;  i~) 

{ 

mult_cntrl[i].clk2  =  1; 
mult_cntrl[i].clkl  =  1; 


for  (i  =  3;  i  ;■  =  0:  i— ) 

{ 

preadd_cntrllij  .clk2  =  1: 
preadd_cntrl!ij.clkl  =  1; 


} 


for  (i  =  2;  i  >  =  0;  i— ) 

{ 

postadd_cntrl[i].clk2  =  1; 
postadd_cntrl[il.clkl  =  1; 

} 

tmp.clk2  =  1; 
tmp.clkl  =  1; 


fp  =  fopen(”master_controI”,  ”r”);  /*  input  word  from  controller  */ 
hp  =  fopen(”preadd_cntrl”  ,”w”);  /*  control  signals  for  multiplier  column*/ 

ip  =  fopen(”mult_cntrr,”w”);  /*  control  signals  for  multiplier  column*/ 
jp  =  fopen(”postadd_cntrl”,”w”);  /*  control  signals  for  post  add  column*/ 
fscanf(fp,”%d”,&cycles); 

/*  The  first  word  in  the  file  is  used  as  to  control  the  loop.  */ 

while  (clk_count  <  cycles) 

{ 

fscanf(fp,”%d” ,  &clk_count); 
elk  =  clk_count  %  clk_cycle; 
clk_int  =  clk_int  %  clk_cycle; 

for  (i  =  0;  i<=  19;  i++)  /*  read  all  20  control  signals  which 

are  sent  out  each  clock  cycle  */ 

fscanf(fp,  ”%d%”,&c_word[i]); 

/*  start  execution  of  program  */ 

rst_add  =  c_word[16|;  /*the  reset  signal  for  the  adder  is 

in  position  16  in  the  file  */ 

preadd_cntrl[0].clk2  =  rst_add; 

if  (elk  !=  elk _ jnt)  /*check  to  ensure  validity  of  the  data  */ 

{ 

printf(  "clocks  are  not  aligned!  elk  =  %d  clkjnt  =  %d0, 
elk,  clkjnt); 

exit(); 

} 


/*  clock  two  events 

shifting  operations*/ 

preadd_cntrl[l|.clk2  =  preadd_cntrl|Oj.clk  I ; 
preadd_cntrl(2j.c!k2  =  preadd_cntrlflj.clkl : 
preadd_cntrl[3]  .clk2  =  preadd_cntrl[2j.clkl; 
mult_cntrl[0].clk2  —  preadd_cntrl[3i.clkl; 

for  (i  =  -10:  i  >=  0;  i— ) 
mult_cntrl(i-r  lj.clk2  =  mult_cntrliij.clkl . 

postadd_cntrl:0|  .clk2  =  mult_cntrl!ll].clkl; 
postadd_cntriili.clk2  =  postadd_.cn  trlfOi. elk  1 ; 
postadd_cntrl[2  .clk2  =  postadd_cntrUll.cIkl: 
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*  clock  one  events  */ 

for  (i  =  3;  i  >=  0;  i— ) 
preadd_cntrl[ij.clkl  =  preadd_cntrl(i].clk2  ; 

for  (  i  =  41;  i  >=  0;  i— ) 
mult_cntrl[i].clkl  =  mult_cntrl[i].clk2; 

for  (i  =  2;  i  >=  0;  i-) 
postadd_cntrl(i].clkl  =  postadd_cntrl[i].clk2  ; 

/*  the  multiplier  signals  are  generated  in  sets  of  three  */ 

for  (  i  =  0;  i  <=  13;  i++) 

{ 

flags[ij.reset_0  =  mult_cntrl(3*i  +  lj.clk2; 
flags(i].reset_l  =  !(flags[iJ.reset_0); 

} 

for(i  =  0;  i  <=13;  i++) 

{ 

flags[i].rstdc  =  mult_cntrl[3*i  4-  lj.clkl; 

flags(ij.s_extend  =  !(mult_cntrl(3*i+l].clkl  &  mult_cntrl[3*i+2].clkl); 

} 

/*  print  the  output  files  */ 

fprintf(hp,”  %d” ,  elk); 
fprintf(ip,”%d01  clk_count); 
fprintf(jp,”  %d”,  cIk_count); 

for  (i  =  0;  i  <=  2;  i++) 

{ 

fprintf(hp,”  %d  ”,  preadd_cntrl[i].clk2); 
fprintf(jp,”  %d  ” ,postadd_cntrl[ij.clk2); 

} 

Hags(l3).s_extend  =  0;  /*no  sign  extensions  of  column  thirteen*/ 

for  (i  =  0;  i  <  =  13;  i+  +  ) 

fprintf(ip,”  %d  %d  %d  %d  0  ,flags(ij.reset_0,fiags!il.reset_l. 

flags(i) .rstdc,  flags[i].s_extend); 


clkjnt  +  =  1; 

}  /*  end  while  */ 


} 


/*  end  main  */ 


** 
** 
** 
** 
** 
** 
** 
** 
** 
** 
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************************************************************************ 


DATE:  12  NOV  1985 
AUTHOR:  Jim  Collins 

TITLE:  Preadd  pipeline  simulation  program 
FILENAME:  pre_wfta.c 
PROJECT:  WFT16  Simulation 
OPERATING  SYSTEM:  UNDC  V  4.2 
LANGUAGE:  C 

USE:  This  program  is  the  third  in  the  series  which  model  the 

16-point  winograd  pipeline.  It  follows  the  multiply. c 
program. 


FILES  READ: 

master_control:  control  word  for  the  processor  per  simulation 
cycle. 

preadd_cntrl:  reset  signal  for  the  carry/borrow  of  the 
postadd  column. 

test_piso:  Problem  set  to  be  used  to  caluculate  DFTs 

output  of  bin.c  (decimal-binary  conversion 
program). 

FILES  WRITTEN: 


piso_out: 

zf_out: 

preaddljn 

preadd2_in 

preadd3_jn 

phil_out: 

to_jmult: 


serial  output  of  the  piso. 
output  of  zero  fill  cell, 
input  to  the  preadd  column  1. 
input  to  the  second  preadd  column, 
input  to  the  second  preadd  column, 
output  of  the  latch  following  the  last  adder, 
input  to  the  multiplier  program. 


FILES  INCLUDED: 

typedefin:  structure  declarations  for  the  program. 
fn_add.c:  binary  addition  function, 
sr.c:  evaluates  the  set  reset  function  (SRFF). 

declare:  type  declarations  for  the  program. 


*************************************************************************, 
#include  <stdio.h> 

#include  ’’typedefin” 

#include  ”fn_add.c” 

#include  ”sr.c” 

#include  "declare” 

^define  clk_cycle  32  '* 


16  point  wfta  cycle  */ 


qp  =  fopen(”col3_out”,”w’’); 
sp  =  fopen(”col4_out”,”w”); 
rp  =  fopen(”col5_out”,”w’’); 
tp  =  fopen(”col6_out”,”w”); 
up  =  fopen(”col7_out”>”w”); 
vp  =  fopen(”  col8_out” w” ); 
wp  =  fopen(”col9_out”,”w”); 
yp  =  fopen(”cola_out”,”w”); 
zp  =  fopen(”colb_out”,”w”); 
pz  =  fopen(”colc_out”,”w”); 
xp  =  fopen(”cold_out”,”w”); 
zzp  =  fopen(”mult_out”,’’w”); 

fscanf(ap,”%d’\&cycles); 
while  (clk_count  <=  cycles) 

{ 

fscanf(ap,”%d”  ,<fcclk_count);  /*  read  master  control  word  */ 
for  (i=0;  i<  =19;  i++) 
fscanf(ap,”%d”  ,&flags[i]); 
elk  =(clk_count  %  clk_cycle); 
clk_int  =(clk_jnt  %  clk_cycle); 
if  (elk  !=clk_jnt) 

{  printf(” clocks  are  not  synchronized  %d0,  elk); 
exit(); 

} 

mult_round  =  flags[7];  /*  rounding  signal  for  input 

to  the  first  column  */ 

fscanf(bp,  ”%d” ,  &clk_data); 

*****.*»«»...********  ASSIGNMENTS  TO  THE  MULTIPLIER  ***********************  / 


*********************  'Pq  COLUMN  I  **************************  / 

for  (i  =  0;  i  <=  7;  i++) 
if  (  i  <  =  4) 

{  (p00(i])->fflclk2  =  phil Jatch[i]; 

p00[i|->prod_in  =  0;  } 
else 

{  (p00(i))->fflclk2  =  philJatch[i4-5]; 
p00(i|->prodjn  =  0;  } 

p801- >fflclk2  =  phil _ latch !  5  j ; 

p801->prod_jn  =  mult_round; 

p901->fflclk2  =  phil  Jatch[6j; 
p901->prodjn  =  mult_round; 

paOn- >fflclk2  =  phil  Jatch[7l; 
paOn-  prodjn  =  mult_round; 

pbOO-  •  ff  1  clk2  =  phi  l_Jatc h [8] ; 
pbOO-  > prodjn  =  mult_round: 


pcOn->fflclk2  =  phil  Jatchl9); 
pcOn- >  prod_jn  =  mult_round; 


pdOn->fflclk2  =  phil_latch[l3]; 
pdOn-  >  prodjn  =  mult_round; 

pe0n->fflclk2  =  philjatch(l4j; 
peOn-  >  prod_in  =-  mult_round; 

pfOn->fflclk2  =  phil_latch[15]; 
pfOn-  >prod_jn  =  mu!t_round; 

pgOn->fflclk2  =  phil_latch[16j; 
pgOn- >  prod_in  =  mult_round: 

phOO->fflclk2  =  phil _ Iatch[l7]; 

phOO->  prodjn  =  mult_round; 

fscanf(cp,  ”%d ”,  <&clk_data); 
for  (i  =  0;  i  <=  17;  i++) 
fscanf(cp,  ”%d”,  &phil_Jatch[i]); 

*********************  >pQ  COLUMN  1  **************************/ 

for  (i  =  0;  i  <=  7;  i++) 

{  (P10[i]}->fflclk2  =  p00fi]->ff3clkl; 
plO[i)  ->prod_in  =  pOO(i]->sumffclkl;  } 

p81n->fflclk2  =  p801- >  ff3clkl ; 
p8ln->  prodjn  =  p801- >sumffclkl; 

p91n->fflclk2  =  P901->ff3cikl; 
p91n->  prodjn  =  p901- >sumffclkl; 

paln->fflclk2  =  paOn-  >  ff3clk  1 ; 
paln->prodJn  =  paOn->sumffclkl; 

pbl  1-  >  fflclk2  =  pb00->fl3clkl: 
pbl  l->  prodjn  =  pbOO- >  sumffclkl; 

pcl0->fflclk2  =  pc0n-^ff3c!kl; 
pc  10- >  prodjn  =  pcOn- >sumffclkl ; 

pdl  1-  '>fflclk2  =  pd0n-''ff3clkl: 
pd  11-  ;-prod_in  =  pdOn- >sumffclk  1 : 

ppll-  >fflclk2  =  peOn-  •  fT3clk  1 ; 
pdl-  prodjn  =  pe'(n-  --sumffclkl ; 

pf  1 2-  -  fT  1  c  1  k 2  =  pfOn-  >fT3clkl; 
pfl2-  prodjn  =  pfOn-  sumffclkl: 

pglO-  fflclk2  =  pgOn-  >fT3clkl; 
pglO-  >  prodjn  =  pgOn- >  sumffclkl; 
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phln->fflclk2  =  phOO->ff3clkl; 
phln->prod_jn  =  phOO-  >sumffclkl ; 


*********************  "pQ  COLUMN  2  **************************/ 

for  (i  =  0;  i  <=  7;  i++) 

{  p20[ij->fflclk2  =  pl0[ij->fl3clkl; 
p20[i]->prod_in  =  plO[i]->sumffclkl;  } 

p821->fflclk2  =  p81n->ff3clkl; 
p821->prod_jn  =  p81n->sumffclkl; 

p921->fflclk2  =  p91n->ff3clkl; 
p921-  >prod_jn  =  p91n->sumffclkl; 

pa2n->fflclk2  =  paln->fI3clkl; 
pa2n- >  prod_in  =  paln->sumffclkl; 

pb2q->fflclk2  =  pbll->f!3clkl; 
pb2q-  >prod_in  =  pbll->sumfTclkl; 

pc2n->fflclk2  =  pclO->ff3clkl; 
pc2n->  prod  Jin  =  pclO-  >sumffclkl; 

pd2n->fflclk2  =  pdll-  >  ff3clkl; 
pd2n-  >  prod  Jn  =  pdll->sumffclkl; 

pe2n->fflclk2  =  pell- >fT3clkl; 
pe2n-  >  prod  Jn  =  pell->sumffclkl; 

pf20-  >  fflclk2  =  pf  12-  >  fF3clkl ; 
pf20->prodjn  =  pfl2-  >sumffclkl; 

Pg21-  >  ffl  clk2  =  pglO->fT3clkl; 
pg21->prodJn  —  pglO- >sumffclkl; 

Ph22->fflclk2  =  phln->K3clkl; 
ph22->prodJn  =  phln->sumffclkl; 

***.*****************  TQ  COLUMN  3  ************************** 

for  (i  —  0;  i  <  =  7;  i+  +  ) 

{  P30!ij->fflcik2  =  p20fij-  > flfScIk  1 ; 
p30iil-  ^prodjn  =  p20lij- >sumffclk I ;  } 

p83n-  fTlclk2  =  p821-  >  fT3clkl; 
p83n-  .prodjn  =  p821-  >sumffclkl; 

p93n-  fflclk2  =  p92L-  fftclkl; 
p93n-  -prodjn  =  p921-  sumffclkl. 


pa31-  - fTl c lk2  =  pa2n-  >f!3clkl; 
pa31-  prodjn  =  pa2n- >sumffclk  1 ; 

pb3n-  fflclk2  =  pb2q->fT3clkl: 
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pb3n->prod_jn  =  pb2q->sumffclkl; 


pc3n->fflclk2  = 
pc3n->prod_in 

pd31->fflclk2  = 
pd31->prod_jn 

pe31->fflclk2  = 
pe31->prod_jn 


=  pc2n->ff3clkl; 

=  pc2n->sumffclkl; 

=  pd2n->ff3clkl; 

=  pd2n->sumffclkl; 

=  pe2n->ff3clkl; 

=  pe2n->sumffclkl; 


pf3q->fflclk2  = 
pf3q-  >  prod_in  = 


pg31->fflclk2  = 
Pg31-  >  prod  Jn 

ph31->fflclk2  = 
ph31->prod_in 


pf20->ff3clkl; 

=  pf20->sumffclkl; 

pg21->ff3clkl; 

=  pg21->sumffclkl; 


=  ph22->ff3clkl; 

=  ph22->sumffclkl; 


*********************  -j>q  COLUMN  4  ************************ 

for  (i  =  0;  i  <=  7;  i++) 

{  p40[i]->fflclk2  =  P30[i]->ff3clkl; 
p40[i]->prod_in  —  p30[i]->sumffclkl;  } 

P840->fflclk2  =  p83n->ff3clkl; 
p840-  >  prod_jn  =  p83n->sumffclkl; 

P940->fflclk2  =  P93n->ff3clkl; 
p940->prodjn  =  p93n->sumffclkl; 

pa4q-  >fflclk2  =  pa31->fT3clkl; 
pa4q- >  prodjn  =  pa31-  >sumffclkl; 

pb40->fflclk2  =  pb3n->fF3clkl; 
pb40-  >prod_jn  =  pb3n->sumffclkl; 

pc41->fflclk2  =  pc3n->ff3clkl; 
pc41->prodJn  =  pc3n- >sumffclkl; 

pd40->fflclk2  =  pd3 1-  >  ff3c  1  k  1 ; 
pd40-  >prod_jn  =  pd31->sumffclkl; 

pe4Q-  >  fflclk2  =  pe31->fT3cIkl; 
pe40- >  prodjn  =  pe31->sumfTclkl; 

pf4n->fflclk2  =  pf3q->fT3clkl; 
pf4n->prodJn  =  pf3q->sumffclkl; 

pg4n->fflclk2  =  pg31->f!3clkl; 
pg4n->  prodjn  =  pg31- >sumffclkl ; 

ph40-^fflclk2  =  ph31-  >  ff3clk  1 : 
ph40->prodjn  =  ph31->sumffclkl; 


/*********************  -pQ  COLUMN  5  **************************/ 


for  (i  =  0;  i  <=  7;  i++) 

{  p50['ij->fiFlclk2  =  p40[i]->ff3clkl; 
p50[i]->prod_in  =  p40[i]->sumffclkl;  } 

p851->fflclk2  =  p840->ff3clkl; 
p851->prod_in  =  p840->sumffclkl; 

p951->fflclk2  =  P940->ff3clkl; 
p951->prod_jn  =  p940->sumffclkl; 

pa50->fflclk2  =  pa4q->fT3clkl; 
pa50- >  prodjn  =  pa4q->sumffclkl; 

pb5n->fflclk2  =  pb40->ff3clkl; 
pb5n->  prodjn  =  pb40->sumflfclkl; 

pc50->fflclk2  =  pc41->ff3clkl; 
pc50->prodjn  =  pc41->sumffclkl; 

pd5n->fflclk2  =  pd40->fif3clkl; 
pd5n->  prodjn  =  pd40->sumffclkl; 

pe5n->flflclk2  =  pe40->ff3clkl; 
pe5n-  >  prodjn  —  pe40->sumffclkl; 

pf51->fflclk2  =  pf4n->ff3clkl; 
pf51->prodJn  =  pf4n->sumffclkl; 

pg50->fflclk2  =  pg4n->fl3clkl; 
pg50->  prodjn  =  pg4n->sumffclkl; 

ph51->fflclk2  =  ph40->ff3clkl; 
ph51->  prodjn  —  ph40->sumffclkl; 

*********************  -pQ  COLUMN  6  **************************/ 


for  (i  =  0;  i  <=  7;  i++) 

{  p60[i]->fflclk2  =  p50(i]->ff3clkl; 
p60[i|-> prodjn  =  p50fi]->sumffclkl;  } 

p861-  >  ffl  clk2  =  P851->ff3clkl; 
p861-  >  prodjn  =  p851->sumffclkl; 

p961-  >  ffl  c  lk2  =  P951->ff3clkl; 
p961->  prodjn  =  p951->sumffclkl; 

pa62->  (Tlclk2  =  pa50->ff3clkl; 
pa62- >  prodjn  =  pa50-  >sumffclkl; 


pb6n->fflclk2  =  pb5n->fT3clkl; 
pb6n-  > prodjn  =  pb5n->sumffclkl; 


pc61-  fflclk2  =  pc50-  .ff3clkl; 


pc61->  prodjn  =  pc50->sumffclki; 

pd6n->fflclk2  =  pd5n->ff3clkl; 
pd6n- >  prod_in  =  pd5n->sumffclkl; 

pe6n->fiflclk2  =  pe5n->ff3clkl; 
pe6n->prodJn  =  pe5n->sumffclkl; 

pf6n->fflclk2  =  pf51-  >  fT3dkl; 
pf6n-  >  prod_in  =  pf51->sumffclkl; 

pg6n->fflclk2  —  pg50->f!3clkl; 
pg6n->prod_in  —  pg50->sumffcikl; 

ph61->fflclk2  =  ph51->ff3clkl; 
ph61  ->prod_in  =  ph51->sumffclkl; 

*********************  'j'Q  COLUMN  7  **************************/ 

for  (i  =  0;  i  <=  7;  i-t-+) 

{  p70[i]->fflclk2  =  p60[i]->ff3clkl; 
p70[i]->  prodjn  =  p60[i]->sumffclkl;  } 

p870->fflclk2  =  P861->ff3clkl; 
p870->prodjn  =  p861->sumffclkl; 

P970->fflclk2  =  P961->ff3clkl; 
p970->prodjn  —  p961->sumffclkl; 

pa7n->fflclk2  =  pa62-  >  fI3clkl; 
pa7n->prod_in  =  pa62->sumffclkl; 

pb70->fflclk2  —  pb6n->ff3clkl; 
pb70-  >  prodjn  =  pb6n->sumlfclkl; 

pc7n->fflclk2  =  pc61->ff3clkl; 
pc7n->  prodjn  =  pc61->sumffclkl; 

pd70->fflclk2  =  pd6n->ff3clkl; 
pd70->prodjn  =  pd6n- >sumffclkl: 

pe70->fflclk2  =  pe6n->ff3clkl; 
pe70- >  prod_in  =  pe6n->sumffclkl; 

pf70->fflclk2  =  pf6n-  >  fT3c  1  k  1 ; 
pf70-  >  prod_in  =  pf6n-  >sumffclkr, 

pg71-  >fflclk2  =  pg6n- > ff3clk I ; 
pg71-  'prodjn  =  pg6n->sumffclkl; 

ph70-  > fflclk2  =  ph6 1- > fT3c lk  1 ; 
ph70->  prodjn  —  ph61->sumfTcIkl; 

*******.«***********«  T0  COLUMN  8  ************************** 
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for  (i  =  0;  i  <=  7;  i++) 

{  p80(ij->fflclk2  =  P70[ij->ff3clki; 
p80[i]->prod_jn  =  p70[i]->sumffclkl;  } 

p880->fflclk2  =  p870->ff3clkl; 
p880->prodjn  =  p870->sumffclkl; 

p980->fflclk2  =  p970->ff3clkl; 
p980->prodjn  =  p970->sumfTclkl; 

pa80->fflclk2  =  pa7n->ff3clkl; 
pa80->prodjn  =  pa7n->sumffclkl; 

pb82->fflclk2  =  pb70-  >  fT3clkl; 
pb82->prod_in  =  pb70->sumffclkl; 

pc82->fflclk2  =  pc7n->ff3clkl; 
pc82-  >  prod_in  =  pc7n->sumffclkl; 

pd80->fflclk2  =  pd70->ff3clkl; 
pd80- >  prod_in  =  pd70->sumffclkl; 

pe80->fflclk2  =  pe70->£f3clkl; 
pe80- >  prod_jn  =  pe70->sumffclkl; 

pf82->fflclk2  =  pf70->ff3clkl; 
pf82->prodJn  =  pf70->sumffclkl; 

pg8q->fflclk2  =  pg71->ff3clkl; 
pg8q->prodJn  =  pg71->sumffclkl; 

ph8q->fflclk2  =  ph70->ff3clkl; 
ph8q- >  prod_in  =  ph70->sumffclkl; 

*********************  jq  COLUMN  9  **************************/ 


for  (i  =  0;  i  <=  7;  i++) 

{  p90(i]->fflclk2  =  P80[i]->fT3clkl; 
p90[i]->prod_jn  =  p80(i]->sumfTclkl;  } 

p891-  >  fflclk2  =  p880->ff3clkl; 
p891->prodJn  =  p880-  >sumffclkl; 

p991->fflclk2  =  p980->ff3clkl; 
p991- >prod_in  —  p980- >sumffclkl, 

pa92-  >  fflclk2  =  pa80->ff3clkl; 
pa92->prodJn  =  pa80-  >sumfTclkl; 

pb9q->fflclk2  =  pb82->fT3c!kl; 
pb9q->prodJn  =  pb82->sumffclkl; 


pc91-  >ffldk2  =  pc82->ff3cJkl; 
pc91-  >  prodjn  =  pc82- >sumffclkl ; 
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V*  V  V- 


pd9n->fflclk2  =  pd80->ff3clkl; 
pd9n-  >  prod_in  =  pd80->sumffclkl; 


pe9n->fflclk2  =  pe80->ff3clkl; 
pe9n- >  prod_in  =  pe80->sumffclkl; 

pf9n->fflclk2  =  pf82-  >  f!3clkl ; 
pf9n->prod_in  =  pf82->sum£Fclkl; 

pg9n->fflclk2  =  pg8q->fI3clkl; 
pg9n- >  prod_in  —  pg8q->sumffclkl; 

ph92->fflclk2  =  ph8q->ff3clkl; 
ph92->prod_jn  =  ph8q->sumffclkl; 

*********************  'j'Q  COLUMN  10  *************************  I 

for  (i  =  0;  i  <=  7;  i++) 

{  pa0[i]->fflclk2  =  p90[i|->ff3clkl; 
paOfij- > prod_in  =  p90(i]->sumffclkl;  } 

P8al->fflclk2  =  p891->ff3clkl; 
p8al- >  prod_in  =  p891->sumffclkl; 

p9al->fflclk2  =  p991->ff3clkl; 
p9al->prodJn  =  p991->sumffclkl; 

paa0->fflclk2  =  pa92-  >  fT3clkl ; 
paaO- >  prodjn  =  pa92->sumffclkl; 

pba0->fflclk2  =  pb9q->ff3clkl; 
pbaO- >  prod_jn  =  pb9q->sumffclkl; 

pcal->fflclk2  =  pc91->ff3clkl; 
peal- >  prod_jn  =  pc91->sumffclkl; 

pdan->fflclk2  =  pd9n->ff3clkl; 
pdan- >  prodjn  =  pd9n-  >sumffclkl; 

pean->fflclk2  =  pe9n->ff3clkl; 
pean- >  prod_in  =  pe9n- >sumffcikl; 

pfal-  >fflclk2  =  pf9n->ff3clkl; 
pfal->  prodjn  =  pf9n-  ^>sumffclkl; 

pgan->fflclk2  =  pg9n->fF3clkl; 
pgan->  prodjn  =  pg9n->sumffclkl; 

phaO-  -fTlclk2  =  ph92- > fT3clkl; 
phaO-  -prod_in  =  ph92- >sumflclkl ; 

*********************  jo  COLUMN  11  ************************* 

for  (i  =  0;  i  <  =  7;  i+-t-) 

{  pb0[ij->fflclk2  =  pa0[i]->ff3clkl: 
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pbO(i]->prod_in  =  paO[i)->sumffclkl;  } 


p8bn->fflclk2  —  p8al->ff3clkl; 
p8bn- >  prod_jn  =  p8al->sumffclkl; 

p9bn->fflclk2  ==  p9al->fF3clkl; 
p9bn->prodJn  =  p9al->sumffclkl; 

pabq->fflclk2  =  paa0->ff3clkl; 
pabq->prodJn  =  paaO->sumffclkl; 

pbbl->fflclk2  =  pbaO->ff3clkl; 
pbbl->prod_jn  =  pbaO->sumffclkl; 

pcbn->fflclk2  =  pcal->ff3clkl; 
pcbn->prod_in  =  pcal->sumffclkl; 

pdbl->fflclk2  =  pdan->ff3clkl; 
pdbl->prod_jn  =  pdan->sumflclkl; 

pebl->fflclk2  =  pean->ff3clkl; 
pebl->prod_jn  =  pean->sumffclkl; 

pfbl->fflclk2  =  pfal->ff3clkl; 
pfbl->prod_in  =  pfal->sumffclkl; 

pgbl->fflclk2  =  pgan->ff3clkl; 
pgbl->prodJn  =  pgan->sumffclkl; 

phbn-  >fflclk2  —  phaO-  >  ff3clkl ; 
phbn-  >  prod_in  =  phaO->sumffclkl; 

*********************  -pQ  COLUMN  12  ************************ 

for  (i  =  0;  i  <=  7;  i++) 

{  pcOfij- >  fflclk2  =  pb0[i]->ff3clkl; 
pcOfi]  ->prod_in  =  pbOfi]->sumfTcikl;  } 

p8cn->fflclk2  =  p8bn->fT3clkl; 
p8cn->prod_in  =  p8bn- >sumffclkl; 

p9cn->fflclk2  =  p9bn-  >  ff3clkl ; 
p9cn- >prodJn  =  p9bn- >sumflclkl; 

pac2-  >  ff  1  clk2  =  pabq->ff3clkl; 
pac2->prodJn  =  pabq->sumffclkl; 

pbc  1-  >fflclk2  =  pbbl->fT3clkl; 
pbcl->prodJn  =  pbbl- >sumffclkl ; 

pcc2-  fflclk2  =  pcbn-  >ff3clkl; 
pcc2-  prod  Jn  =  pcbn- >sumffclkl; 

pdcl-  fflclk2  =  pdbl- >ff3clkl; 
pdcl-  prodjn  =  pdbl-  >sumfTclk  1 ; 
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pecl->fflclk2  =  pebl-  >  fT3clkl; 
pecl->prod_in  =  pebl->sumffclkl; 

pfcO->fflclk2  =  pfbl->ff3clkl; 
pfcO- >  prodjn  =  pfbl->sumffclkl; 

pgcq->fflclk2  =  pgbl->ff3clkl; 
pgcq->prodJn  =  pgbl->sumffclkl; 

phcn->fflclk2  ==  phbn->ff3clkl; 
phcn->prod_jn  =  phbn->sumffclkl; 

/*********************  -pQ  COLUMN  13  ************************* j 

for  (i  =  0;  i  <==  6;  i++) 

{  pdl[i]->fflclk2  =  pc0(i]->ff3clkl; 
pdl(ij->prod_in  =  pcO[i)->sumffclkl;  } 

p7dn->fflclk2  =  pc0[7]->fT3clkl; 
p7dn->prod_jn  =  pc0[7]->sumffclkl; 

p8dl->fflclk2  =  p8cn->f!3clkl; 
p8dl->prod_in  ==  p8cn->sumffclkl; 

p9dl->fflclk2  =  p9cn->ff3clkl; 
p9dl->prod_in  =  p9cn->sumffclkl; 

pad0->fflclk2  =  pac2->flf3clkl; 
pad0->prodjn  =  pac2->sumffclkl; 

pbdl->fFlclk2  =  pbcl->ff3clkl; 
pbdl->prod_in  =  pbcl->sumffclkl; 

pcdn->fflclk2  =  pcc2->ff3clkl; 
pcdn- >  prod_in  =  pcc2->sumffclkl; 

pddn->fflclk2  =  pdcl->ff3clkl; 
pddn->  prodjn  =  pdcl->sumffclkl; 

pedn->fflclk2  =  pecl->ff3clkl; 
pedn-  >  prod_in  =  peel- >sumffclkl; 

pfdn->fflclk2  =  pfcO-  >ff3clkl; 
pfdn-  >prod_in  —  pfcO- >sumffclkl; 

pgdl->fflclk2  =  pgcq-  >  fT3clk  1 ; 
pgdl->prod_jn  =  pgcq- >sumffclkl; 

phdn-  >(Tlclk2  =  phen-  >  fT3clk  1 ; 
phdn->prodJn  =  phcn->sumfTclkl: 

***,*******«*******»***  COLUMN  0  OF  THE  MULTIPLIERS  ****** 
fscanffbp.  ’’°ddrcd°cdccd’' ,  <£reset_0, Preset  J  ,&rstdc.  «fcsign_ext); 


fprintf(zp,”  procLin  =  %d  ”,  phOO-  >prod_jn);  } 


for  (i  =  0;  i  <  =  7;  i+-i-) 
m0(p00[i],  sign_ext); 

ml(p801,  resetjO,  sign_ext); 
ml(p901,  resetjO,  sign_ext); 
al(paOn,  reset_l,  sign_ext); 
mO(pbOO,  sign_ext); 
nl(pcOn,  reset_l,  sign_ext); 
nl(pdOn,  resetjl,  sign_ext); 
nl(peOn,  reset_I,  sign_ext); 
nl(pfOn,  reset_l,  sign_ext); 
nl(pgOn,  reset_l,  sign_ext); 
mO(phOO,  sign_ext); 

I  *********************  COLUMN  1  OF  THE  MULTIPLIER  **************************/ 

/*  read  the  input  control  signals  for  this  column  before  calling  the 
multiplier  function  */ 

fscanf(bp,  ”%d%d%d%d”,  &reset_0,&reset_l,&rstdc,  &sign_ext); 

for  (i  =  0;  i  <=  7;  i++) 
m0(pl0[i],  sign_ext); 

nl(p81n,  reset_l,  sign_ext); 
nl(p91n,  reset_l,  sign_ext); 
nl(paln,  reset_I,  sign_ext); 
ml(pbll,  resetjO,  sign_ext); 
mO(pclO,  sign_ext); 
mi(pdll,  resetjO,  sign_ext); 
ml(pell,  resetjO,  sign_ext); 
m2(pfl2,  resetjO,  rstdc,  sign_ext); 
mO(pgIO,  sign_ext); 
nl(phln,  reset_l,  sign_ext); 


*********************  COLUMN  2  OF  THE  MULTIPLIER  **************************/ 
fscanf(bp,  ”%d%d%d%d ”,  &resetjO,&reset_l,&rstdc,  <Lsign_ext); 


for  (i  =  0;  i  <  =  7;  i+-f-) 
m0(p20(i],  sign_ext); 

ml(p821,  resetjO,  sign_ext); 
ml(p921,  resetjO,  sign_ext); 
nl(pa2n,  reset_l,  sign_ext); 
n2(pb2q,  reset_l.  rstdc,  sign_ext); 
nl(pc2n,  reset_l.  sign_ext); 
nl(pd2n.  reset_l.  sign_ext); 
nl(pe2n,  reset_l,  sign_ext); 
m0(pf20,  sign_ext); 


ml(pg21,  resetjO,  sign_ext); 
m2(ph22,  resetj),  rstdc,  sign_ext); 


/*****************•***  COLUMN  3  OF  THE  MULTIPLIER  **************************/ 

fscanf(bp,  ”%d%d%d%d”,  &reset_0,&resetJ,&rstdc,  &sign_ext); 

for  (i  =  0;  i  <=  7;  i++) 
m0(p30[i|,  sign_ext); 

nl(p83n,  reset_I,  sign_ext); 
nl(p93n,  reset_l,  sign_ext); 
ml(pa31,  resetj),  sign_ext); 
nl(pb3n,  reset_l,  sign_ext); 
nl(pc3n,  reset_l,  sign_ext); 
ml(pd31,  resetj),  sign_ext); 
ml(pe31,  resetjO,  sign_ext); 
n2(pf3q,  reset_l,  rstdc,  sign_ext); 
ml(pg31,  resetjO,  sign_ext); 
ml(ph31,  resetjO,  sign_ext); 


/*********************  COLUMN  4  OF  THE  MULTIPLIER 


**************************/ 


fscanf(bp,  ”%d%d%d%d” ,  &reset_0,&resetJ,&rstdc,  &sign_ext); 


for  (i  =  0;  i  <==  7;  i++) 
m0(p40[i],  sign_ext); 

m0(p840,  sign_ext); 
m0(p940,  sign_ext); 
n2(pa4q,  reset_I,  rstdc,  sign_ext); 
m0(pb40,  sign_ext); 
ml(pc41,  resetjO,  sign_ext); 
m0(pd40,  sign_ext); 
m0(pe40,  sign_ext); 
nl(pf4n,  reset_I,  sign_ext); 
nl(pg4n,  reset_I,  sign_ext); 
m0(ph40,  sign_ext); 

*********************  COLUMN  5  OF  THE  MULTIPLIER  ************************** 


fscanf(bp,  nC!cd%d%d%d” ,  <fcreset_0,&resetJ,&rstdc.  &sign_ext); 

for  (i  =  0;  i  <—  7;  i-*-  +  ) 
m0(p50(ij,  sign_ext); 

ml(p851.  resetjO,  sign_ext); 
ml(p95I,  resetj).  sign_ext); 
mO(paoO.  sign_ext); 
nl(pb5n,  resetj,  sign_ext); 
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m0(pc50,  sign_ext); 
nl(pd5n,  reset_I,  sign_ext); 
nl(pe5n,  reset_l,  sign_ext); 
ml(pf51,  resetjO,  sign_ext); 
m0(pg50,  sign_ext); 
ml(ph51,  resetjO,  sign_ext); 


/*******«*************  COLUMN  6  OF  THE  MULTIPLIER  **************************/ 


fscanf(bp,  ”%d%d%d%d”,  &reset_0,&reset_l,&rstdc,  &sign_ext); 

for  (i  =  0;  i  <  =  7;  i++) 
m0(p60[i],  sign_ext); 

mi(p861,  resetjO,  sign_ext); 
ml(p96I,  resetjO,  sign_ext); 
m2(pa62,  resetjO,  rstdc,  sign_ext); 
nl(pb6n,  reset_l,  sign_ext); 
ml(pc6I,  resetjO,  sign_ext); 
nl(pd6n,  reset_JL,  sign_ext); 
nl(pe6n,  reset_I,  sign_ext); 
nl(pf6n,  reset_I,  sign_ext); 
nl(pg6n,  reset_J,  sign_ext); 
ml(ph61,  resetjO,  sign_ext); 


<*********************  COLUMN  7  OF  THE  MULTIPLIER  **************************/ 

fscanf(bp,  ”%d%d%d%d” ,  <fereset_0,<fereset_l,&rstdc,  <fcsign_ext); 

for  (i  =  0;  i  <;=  7;  i++) 
m0(p70[i],  sign_ext); 

m0(p870,  sign_ext); 
m0(p970,  sign_ext); 
nl(pa7n,  reset_I,  sign_ext); 
m0(pb70,  sign_ext); 
nl(pc7n,  reset_l,  sign_ext); 
m0(pd70,  sign_ext); 
m0(pe70,  sign_ext); 
m0(pf70,  sign_ext); 
ml(pg71,  resetj),  sign_ext); 
m0(ph70,  sign_ext); 


*********«,*.«»*„**.*  COLUMN  8  OF  THE  MULTIPLIER  ************************** 

fscanf(bp,  "^cd^d^d^od”,  &reset_0.&reset_l.i'rstdc.  «&sign_ext); 

for  (i  =  0:  i  <  =  7;  i+-t-) 
m0(p80lij,  sign_ext); 


m0(p880,  sign_ext); 
m0(p980,  sign_ext); 


r’ 
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pecl->fflclk2  =  pebl->ff3clkl; 
pecl->prod_jn  =  pebl->sumffclkl; 

pfcO->fflclk2  =  pfbl-  >  ff3clkl; 
pfcO- >  prod_in  =  pfbl->sumffclkl; 

pgcq->fflclk2  =  pgbl->fF3clkl; 
pgcq->prodJn  =  pgbl->sumffclkl; 

phcn->fflclk2  =  phbn->f!F3clkl; 
phcn- >  prod_in  =  phbn->sumffclkl; 

,***«*****************  -j>0  COLUMN  13  ************************* 

for  (i  =  0;  i  <=  6;  i++) 

{  pdl[i]->fflclk2  =  pc0[i]->ff3clkl; 
pdl[i]->prod_jn  =  pcO[i)->sumffclkl;  } 

p7dn->fflclk2  =  pcO[7]->ff3clkl; 
p7dn-  >prod_in  =  pc0[7]->sumffclkl; 

p8dl->fflclk2  =  p8cn->ff3clkl; 
p8dl->prod_in  =  p8cn->sumffclkl; 

p9dl->fflclk2  =  p9cn->ff3clkl; 
p9dl->prod_jn  =  p9cn->sumffclkl; 

padO->fflclk2  =  pac2->ff3cikl; 
padO- >  prodjn  =  pac2->sumffclkl; 

pbdl->fflclk2  =  pbcl->fT3clkl; 
pbdl  ->prod_in  =  pbcl->sumffclkl; 

pcdn->fflclk2  =  pcc2->fT3clkl; 
pcdn- >  prod_in  =  pcc2->sumffclkl; 

pddn->fflclk2  =  pdcl->ff3clkl; 
pddn->prodJn  =  pdcl->sumffclkl; 

pedn->fflclk2  =  pecl->ff3clkl; 
pedn->prod_in  =  peel- >sumffclkl; 

pfdn->fflclk2  =  pfc0->fT3clkl; 
pfdn- >prod_in  =  pfcO-  >sumflclkl; 

pgdl->fflclk2  =  pgcq-  >ff3clkl; 
pgdl->prod_in  =  pgcq->sumffclkl; 

phdn- >  fflclk2  =  phcn->fT3c!kl; 
phdn->prod_in  =  phcn- >sumfTclkl; 

*«*...,*****♦******«*.*  COLUMN  0  OF  THE  MULTIPLIERS  **** 

fscanf(bp,  ’lCedpcd°iod<Jod'’,  &reset_0,.fcreset  J  .ifcrst.de,  <&sign_ext); 


fprintf(zp,”  prod_in  =  %d  ”,  phOO->prod_jn);  } 


for  (i  =  0;  i  <=  7;  i++) 
m0(p00(i],  sign_ext); 

ml(p801,  resetjO,  sign_ext); 
ml(p901,  resetjO,  sign_ext); 
nl(paOn,  reset_l,  sign_ext); 
mO(pbOO,  sign_ext); 
nl(pcOn,  reset_l,  sign_ext); 
nl(pdOn,  reset_l,  sign_ext); 
nl(peOn,  reset_l,  sign_ext); 
nl(pfOn,  reset_l,  sign_ext); 
nl(pgOn,  reset_l,  sign_ext); 
mO(phOO,  sign_ext); 

*********************  COLUMN  1  OF  THE  MULTIPLIER  **************************/ 

/*  read  the  input  control  signals  for  this  column  before  calling  the 
multiplier  function  */ 

fscanf(bp,  ”%d%d%d%d” ,  <fcreset_0,&reset_l,&rstdc,  &sign_ext); 


for  (i  =  0;  i  <=  7;  i++) 
m0(pl0[i],  sign_ext); 

nl(p81n,  reset_l,  sign_ext); 
nl(p91n,  reset_l,  sign_ext); 
nl(paln,  reset_l,  sign_ext); 
ml(pbll,  resetjO,  sign_ext); 
m0(pcl0,  sign_ext); 
ml(pdll,  resetjO,  sign_ext); 
ml(pell,  resetjO,  sign_ext); 
m2(pfl2,  resetjO,  rstdc,  sign_ext); 
mO(pglO,  sign_ext); 
nl(phln,  reset_l,  sign_ext); 


*********************  COLUMN  2  OF  THE  MULTIPLIER  ************************** 

fscanf(bp,  "^d^iod^Sd^d”,  <fcreset_0.&reset_l  ,&rst.dc.  &sign_ext); 


for  (i  =  0;  i  <  =  7;  i  +  +  ) 
m0(p20[ij,  sign_ext); 

ml(p821,  reset_0,  sign_ext); 
ml(p92l.  reset_0,  sign_ext); 
nl(pa2n,  reset_l,  sign_ext); 
n2(pb2q,  reset_l,  rstdc,  sign_ext); 
nl(pc2n,  reset_l,  sign_ext); 
nl(pd2n,  reset_l,  sign_ext); 
nl(pe2n,  reset_l,  sign_ext): 
m0(pf20.  sign_ext); 
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ml(pg21,  resetjO,  sign_ext); 
m2(ph22,  resetjO,  rstdc,  sign_ext); 


*********************  COLUMN  3  OF  THE  MULTIPLIER  **************************/ 

fscanf(bp,  ”%d%d%d%d” ,  &reset_0,&reset_l,&rstdc,  &sign_ext); 

for  (i  =  0;  i  <  —  7;  i++) 
m0(p30[i],  sign_ext); 

nl(p83n,  reset_l,  sign_ext); 
nl(p93n,  reset_I,  sign_ext); 
ml(pa31,  resetjO,  sign_ext); 
nl(pb3n,  reset_l,  sign_ext); 
nl(pc3n,  reset_l,  sign_ext); 
ml(pd31,  resetjO,  sign_ext); 
ml(pe31,  resetjO,  sign_ext); 
n2(pf3q,  reset_J,  rstdc,  sign_ext); 
ml(pg3I,  resetjO,  sign_ext); 
ml(ph31,  resetjO,  sign_ext); 


/ *********************  COLUMN  4  OF  THE  MULTIPLIER  ************************** 


fscanf(bp,  ”%d%d%d%d” ,  &reset_0,&reset_l,&rstdc,  &sign_ext); 


for  (i  =  0;  i  <=  7;  i++) 
m0(p40[i],  sign_ext); 

m0(p840,  sign_ext); 
m0(p940,  sign_ext); 

n2(pa4q,  reset _ 1 ,  rstdc,  sign_ext); 

m0(pb40,  sign_ext); 
ml(pc41,  resetjO,  sign_ext); 
m0(pd40,  sign_ext); 
m0(pe40,  sign_ext); 
nl(pf4n,  reset_I,  sign_ext); 
nl(pg4n,  reset_l,  sign_ext); 
mOfphlO.  sign_ext); 

»*»*»***.*.«»****.***  COLUMN  5  OF  THE  MULTIPLIER  ************************** 


fscanf(bp,  ''rcd°cd<7e>d%dr\  <fereset_0,&reset_J,<fcrstdc.  &sign_ext); 

for  ( i  =  0;  i  <  -  7;  i-t--*-) 
m0(p50[ii,  sign_ext); 

ml(p851.  reset_0,  sign_ext); 
ml(p95I.  reset _0,  sign_ext); 
mOfpaoO.  sign_ext); 
nl(pbon.  reset_l.  sign_ext); 


m0(pc50,  sign_ext); 
nl(pd5n,  reset_l,  sign_ext); 
nl(pe5n,  reset_l,  sign_ext); 
ml(pf51,  resetjO,  sign_ext); 
m0(pg50,  sign_ext); 
ml(ph51,  resetjO,  sign_ext); 


/*********************  COLUMN  6  OF  THE  MULTIPLIER 


***  **  *  **  4c  *  **  **************  / 


fscanf(bp,  ”%d%d%d%d”,  &reset_0,&reset_l,&rstdc,  &sign_ext); 


for  (i  =  0;  i  <=  7;  i++) 
m0(p60[i])  sign_ext); 

ml(p861,  resetjO,  sign_ext); 
mi(p961,  resetjO,  sign_ext); 
m2(pa62,  resetjO,  rstdc,  sign_ext); 
nl(pb6n,  reset_l,  sign_ext); 
ml(pc61,  resetjO,  sign_ext); 
nl(pd6n,  reset_l,  sign_ext); 
nl(pe6n,  reset_l,  sign_ext); 
nl(pf6n,  reset_l,  sign_ext); 
nl(pg6n,  reset_JL,  sign_ext); 
ml(ph61,  resetjO,  sign_ext); 


.*************«**«****  COLUMN  7  OF  THE  MULTIPLIER  **************************  ^ 

fscanf(bp,  ’%d%d%d%d”.  &reset_0,&reset_l,<fcrstdc,  <fcsign_ext); 

for  (i  =  0;  i  <  =  7;  i-t-+) 
m0(p70[i],  sign_ext); 

m0(p870,  sign_ext); 
m0(p970,  sign_ext); 
nl(pa7n,  reset_l,  sign_ext); 
m0(pb70,  sign_ext); 
nl(pc7n,  reset_I,  sign_ext); 
m0(pd70.  sign_ext); 
m0(pe70,  sign_ext), 
m0(pf70,  sign_ext); 
ir>l(pg71.  reset_0,  sign_ext); 
rn0(ph70,  sign_ext); 


***«.,**«**,*,***,».*  column  8  OF  THE  MULTIPLIER  ************************** 

fscanf(bp.  rr<lr  r<ir7<irrd” .  &reset._O..Ureset._l.Jtrstdc.  <fcsign_ext ); 

for  (i  =  0;  i  - .  =  7;  i - — ) 
m0(p80li;,  sign_ext); 


m0(p880,  sign_ext); 
m0(p980,  sign_ext); 


m0(pa80,  sign_ext); 
m2(pb82,  resetjO,  rstdc,  sign_ext); 
m2(pc82,  resetjO,  rstdc,  sign_ext); 
m0(pd80,  sign_ext); 
m0{pe80,  sign_ext); 
m2(pf82,  resetjO,  rstdc,  sign_ext); 
n2(pg8q,  reset_I,  rstdc,  sign_ext); 
n2(ph8q,  reset_l,  rstdc,  sign_ext); 


//*. *******************  COLUMN  9  OF  THE  MULTIPLIER 


**************************  / 


fscanf(bp,  ”%d%d%d%d” ,  &reset_0,&reset_l,&rstdc,  &sign_ext); 

for  (i  =  0;  i  <=  7;  i++) 
m0(p90[i],  sign_ext); 

ml(p89I,  resetjO,  sign_ext); 
ml(p991,  resetjO,  sign_ext); 
m2(pa92,  resetjO,  rstdc,  sign_ext); 
n2(pb9q,  reset_I,  rstdc,  sign_ext); 
ml(pc91,  resetjO,  sign_ext); 
ni(pd9n,  reset_I,  sign_ext); 
nl(pe9n,  reset_l,  sign_ext); 
nl(pf9n,  resetjl,  sign_ext); 
nl(pg9n,  reset_l,  sign_ext); 
m2(ph92,  resetjO,  rstdc,  sign_ext); 


/*********************  COLUMN  10  OF  THE  MULTIPLIER  **************************/ 

fscanf(bp,  ”%d%d%d%d'n ,  &reset_0,&reset_l,&rstdc,  <fcsign_ext); 

for  (i  —  0;  i  <=  7;  i++) 
mO(paO(i],  sign_ext); 

ml(p8al,  resetjO,  sign_ext); 
ml(p9al,  resetjO,  sign_ext); 
mO(paaO,  sign_ext); 
mO(pbaO.  sign_ext); 
ml(pcal,  resetjO,  sign_ext); 
nl(pdan,  reset_l,  sign_ext); 
nl(pean,  reset_l,  sign_ext); 
ml(pfal,  resetj),  sign_ext); 
nl(pgan,  reset_l,  sign_ext); 
mO(phaO,  sign_ext); 


***********«*****»*.*  COLUMN  11  OF  THE  MULTIPLIER  ************************** 

fscanf(bp,  ” 0~cdc~cdO7cdc~cd'n ,  &reset  J),<fc  reset  _1 ,& rstdc,  &sign_ext); 

for  (i  =  0;  i  -  =  7:  i-*--*-) 

m0(pb0ji],  sign_ext); 


WWW 


nl(p8bn,  reset_l,  sign_ext); 
nl(p9bn,  reset_l,  sign_ext); 
n2(pabq,  reset_l,  rstdc,  sign_ext); 
ml(pbbl,  resetjO,  sign_ext); 
nl(pcbn,  reset_l,  sign_ext); 
ml(pdbl,  resetjO,  sign_ext); 
ml(pebl,  resetjO,  sign_ext); 
ml(pfbl,  resetjO,  sign_ext); 
ml(pgbl,  resetjO,  sign_ext); 
nl(phbn,  reset_I,  sign_ext); 


*********************  COLUMN  12  OF  THE  MULTIPLIER  **************************  > 
fscanf(bp,  ”%d%d%d%d” ,  &reset_0,&resetJ,&rstdc,  &sign_ext); 


for  (i  =  0;  i  <=  7;  i++) 
mO(pcO[i],  sign_ext); 

nl(p8cn,  reset_l,  sign_ext); 
nl(p9cn,  resetj,  sign_ext); 

m2(pac2,  resetjO,  rstdc,  sign_ext); 
ml(pbcl,  resetjO,  sign_ext); 
m2(pcc2,  reset_0,  rstdc,  sign_ext); 
ml(pdcl,  resetjO,  sign_ext); 
ml(pecl,  resetjO,  sign_ext); 
mO(pfcO,  sign_ext); 
n2(pgcq,  resetj,  rstdc,  sign_ext); 
nl(phcn,  resetj,  sign_ext); 


/*********************  COLUMN  13  OF  THE  MULTIPLIER  **************************/ 
fscanf(bp,  ”%d%d%d%d”,  &reset_0,&reset_l,&rstdc,  &sign_ext); 


for  (i  =  0;  i  <=  6;  i++) 
ml(pdl[i|,  resetjO,  sign_ext); 

nl(p7dn,  resetjl,  sign_ext); 
ml(p8dl,  reset_0,  sign_ext); 
ml(p9dl,  resetjO,  sign_ext); 
mO(padO,  sign_ext); 
ml{pbdl,  resetj),  sign_ext); 
nl(pcdn.  reset_l,  sign_ext); 
nl(pddn,  resetj,  sign_ext); 
nl(pedn.  resetj,  sign_ext); 
nl(pfdn.  resetj,  sign_ext); 
ml(pgdl,  resetj),  sign_ext); 
nl(phdn.  resetj,  sign_ext); 


*»****,*»************  PRINT  results  ***************************** 

if  (  clk_count  >  =  39) 
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{  fprintf(np,”  %d  ”,clkjnt); 
for{i  =0;  i  <  =  7;  i++) 
fprintf(np,”%d  ”,(p00(i])->sumffdkl); 

fprintf(np,”  %d  %d  %d  %d  %d  %d  %d  %d  %d  %d  ”,  p801->sumffclkl, 
p901->sumffclkl,  paQn->sumffdkl,  pb00->sumffclkl, 
pcOn->sumffclkl,  pdOn->sumffdkl,  peOn->sumffclkl, 
pfOn->sumffclkl,  pgOn->sumffclkl,  ph00->sumffclkl);  } 

if  (  clk_count  —  42) 

{  fprintf(op,”  %d  ”,clkjnt); 
for(i  =0;  i  <  =  7;  i++) 
fprintf(op,”%d  ”,(plO[i])->sumffclkl); 
fprintf(op,”  %d  %d  %d  %d  %d  %d  %d  %d  %d  %d  ” ,  p81n->sumffclkl, 
p91n->sumfTclkl,  paln->sumffdkl,  pbll->sumffclkl, 
pclO->sumffclkl,  pdll->sumflclklp  pell->sumffdkl, 
pfl2->sumffdkl,  pglO->sumffdkl,  phln->sumffclkl);  } 

if  (  clk_count  >=  45) 

{  fprintf(pp,”  %d  ”,dk_int); 
for(i  =0;  i  <  =  7;  i++) 
fprintf(pp,”%d  * ,(p20[i])->sumffclkl); 
fprintf(pp,”  %d  %d  %od  %d  %d  %d  %d  %d  %d%d”,  p821- >sumffclkl, 
p921->sumffclkl,  pa2n->sumffclkl,  pb2q->sumffclkl, 
pc2n->sumffclkl,  pd2n->sumffclkl,  pe2n->sumffclkl, 
pf20->sumffclkl,  pg21->sumffclkl,  ph22->sumffc)kl);  } 


if  (  clk_count  >=  48) 

{  fpriiuf(qp,B  %d  ”  ,clk_int); 
for(i  —0;  i  <  =  7;  i++) 
fprintf(qp,”%d  ”,(p30[i])->sumffclkl); 

fprintffqp,”  %d  %d  %d  %d  %d  %d  %d  %d  %d  %d  ”,  p83n->sumffclkl. 
p93n- >sumffclkl,  pa31- >sumffclkl,  pb3n->sumffclkl, 
pc3n->sumffclkl,  pd31->sumffclkl,  pe31->sumfTclkl, 
pf3q->sumffclkl,  pg31- >sumffdkl,  ph31->sumffclkl);  } 

if  (  clk_count  >  =  51) 

{  fprintf(sp,"  %d  ”,dk_jnt); 
for(i  =0;  i  <=  7;  i++) 
fprintf(sp,”%d  ’\(p40[i])->sumfTdkl); 
fprintf(sp,B  %d  %d  %d  %d  %d  %d  %d  %d  %d  %d  ”,  p840- >sum(Tclk  1 . 
p940->sumffclkl,  pa4q- >sumffclkl,  pb40->sumffclkl, 
pc41- >sumffdkl,  pd40- >sumffclkl,  pe40->sumffdkl, 
pf4n->sumfTclkl,  pg4n->sumffclkl,  ph40->sumffclkl);  } 

if  (  clk_count  >=  54) 

{  fprintffrp,’’  %d  ”,dk_int); 
for(i  =0;  i  <  =  7 ;  i++) 
fprintf(rp,”<7cd  ” ,(p50fij )- >sumffclkl); 

fprintffrp,"’  ^d  %d  %d  %d  %d  %d  <^d  %d  %d  %d  ”,  p851-  •sumffdkl. 
p951->  sumffdkl,  pa50- >sumffclkl .  pb5n->sumfTclkl. 
pcoO-  >sumffclkl,  pd5n- >sumfTclkl .  peon-  •sumffdk  1 , 
pf51-  sumffdkl.  pg50->sumffc)kl.  ph51-  sumffdkl);  } 
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if  (  clk_count  >  =  57) 

{  fprintf(tp,”  %d  ”,clk_int); 
for(i  =0;  i  <  =  7;  i++) 
fprintf(tp,”%d  ”,(p60(i])->sumffclkl); 
fprintf(tp,”  %d  %d  %d  %d  %d  %d  %d  %d  %d  %d  ”,  p861->sumffclkl, 
p961->sumffclkl,  pa62->sumffcikl,  pb6n->sumfTclkl, 
pc61->sumffclkl,  pd6n->sumffclkl,  pe6n->sumffclkl, 
pf6n->sumfTclkl,  pg6n->sumffclkl,  ph61->sumffclkl);  } 

if  (  clk_count  >  ==  60) 

{  fprintf(up,”  %d  ”,clk_int); 
for(i  =0;  i  <  =  7;  i++) 
fprintf(up,”%d  ”,(p70(ij)->sumffclkl); 
fprintf(up,”  %d  %d  %d  %d  %d  %d  %d  %d  %d%d”,  p870->sumffclkl, 
p970->sumffclkl,  pa7n->sumffclkl,  pb70->sumffclkl, 
pc7n->sumffclkl,  pd70- >sumffclkl,  pe70->sumffclkl, 
pf70->sumffclkl,  pg71->sumffclkl,  ph70->sumffclkl);  } 

if  (  clk_count  >  =  63) 

{  fprintf(vp,”  %d  ”,clk_int); 
forji  =0;  i  <=  7;  i++) 
fprintf(vp,”%d  ”,(p80[i])->sumffclkl); 
fprintf(vp,”  %d  %d  %d  %d  %d  %d  %d  %d  %d  %d  ”,  p880->sumffclkl, 
p980->sumffclkl,  pa80->sumffclkl,  pb82->sumffclkl, 
pc82->sum(Tclkl,  pd80->sumffclkl,  pe80->sumffclkl, 
pf82- >sumffclkl,  pg8q->sumffclkl,  ph8q->sumffclkl);  } 

if  (  clk_count  >  =  66) 

{  fprintf(wp,”  %d  ”,clkjnt); 
for(i  =0;  i  <=  7;  i++) 
fprintf(wp,”%d  ”,(p90[i])->sumffclkl); 
fprintf(wp,”  %d  %d  %d  %d  %d  %d  %d  %d  %d  %d  ”,  p891->sumffclkl, 
p991->sumffclkl,  pa92->sumffclkl,  pb9q->sumffclkl, 
pc91->sumffclkl,  pd9a->sumffclkl,  pe9n->sumffclkl, 
pf9n->sumffclkl,  pg9n->siimffclkl,  ph92->sumffclkl);  } 

if  (  clk_count  >=  69) 

{  fprintf(yp,”  %d  ”,clkjnt); 
for(i  =0;  i  <=  7;  i++) 
fprintf(yp,”%d  ”,(paO[i])->sumfTclkl); 
fprintf(yp,”  %d  %d  %d  %d  %d  %d  %d  %d  %d  °od  ”,  p8ai- >sum(fclkl , 
p9al->sumffclkl,  paaO- >sumffclkl,  pbaO- >sumffclkl , 
peal- >sumffclkl,  pdan->sumffclkl,  pean->sumfTclkl, 
pfal->sumffclkl,  pgan->sumfTclkl,  phaO- >sumfrc!kl );  } 

if  (  clk_count  >  =  72) 

{  fprintffzp,”  %d  ”,clkjnt); 
for(i  =0;  i  <  =  7;  i+  +  ) 
fprintf(zp,',<7ed  ” ,(pb0(i])-  >sumffclkl); 

fprintf(zp.”  %d  %d  %d  %d  %d  %d  %d  %d  %d  %d  ”,  p8bn- >sumflcJk  1 . 
p9bn- >sumffclkl,  pabq-  >sumffclkl,  pbbl->sumffclkl, 
pebn-  >sumffclkl,  pdbl->sumffclkl,  pebl- ^>sumffclkl . 
pfbl-  >sum(Tclkl,  pgbl->sumffclkl,  phbn- >sumffclkl);  } 

if  (  clk_count  >=  75) 
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{  fprintf(pz,”  %d  ”,clk_count); 
for(i  =0;  i  <=  7;  i++) 
fprintf(pz,”%d  ”,(pcO[i])->sumffclkl); 
fprintf(pz,”  %od  %d  %d  %d  %d  %d  %d  %d  %d%d\  P8cn->sumffclkl, 
p9cn->sumffclkl,  pac2->sumffclkl,  pbcl->sumffclkl, 
pcc2->sumffclkl,  pdcl->sumffdkl,  pecl->sumffdkl, 
pfcO->sumffclkl,  pgcq->sumffdkl,  phcn->sumffclkl);  } 

if  (  clk_count  >=  78) 

{  fprintf(xp,”  %d  ”  ,clk_count); 
for(i  =0;  i  <=  6;  i++) 
fprintf(xp %d  ” ,( pd  1  [i] )-  >  sumffdk  1 ); 

fprintf(xp,”  %d  %d  %d  %d  %d  %d  %d  %d  %d  %d  %d  ”,p7dn->sumffclkl, 
p8dl->sumffdkl,  p9dl->sumffdkl,  padO->sumffclkl, 
pbdl->sumffclkl,  pcdn->sumffdkl,  pddn->sumffclkl, 
pedn->sumffdkl,  pfdn->sumffcikl,  pgdl->sumfTclkl, 
phdn->sumffdkl);  } 

fprintf(zzp,n  %d  ”,clk_count); 
for(i  =0;  i  <  =  6;  i++) 
fprintf(zzp,”%d  ”,(pdl[ij)->sumffclkl); 

fprintf(zzp,”  %d  %d  %d  %d  %d  %d  %d  %d  %d  %d  %d  ”,  P7dn->sumffdkl) 
p8dl->sumffdkl,  p9dl->sumfFclkl,  padO->sumffclkl, 
pbdl->sumffclkl,  pcdn->sumffdkl,  pddn->sumffcikl, 
pedn->sumffclkl,  pfdn->sumffclkl,  pgdl->sumffclkl, 
phdn->sumffclkl); 

clkjnt  +=1; 


fclose(ap); 

fclose(bp); 

fclose(cp); 

fclose(np); 

fclose(op); 

fclose(pp); 

fclose(qp); 

fclose(pz); 

fclose(sp); 

fdose(tp); 

fclose(up); 

fclose(vp); 

fclose(wp); 

fclose(xp); 

fclose(yp); 

fclose(zp); 

fclose(zzp); 

} 


"If-' 


ds  In  Functions' 


5.1.1. 


** 

** 

** 
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** 

** 

** 
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** 
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********************************************************************* 

DATE:  29  AUG  1985 

TITLE:  multiplier  functions 

FILENAME:  submO.c,  subml.c,  subm2.c  subnl.c,  subn2.c 

COORDINATOR:  Jim  Collins. 

PROJECT:  WFT16  SIMULATION 

USE:  These  are  the  functions  called  by  the  multiply. c  program 
to  evaluate  the  bits  in  the  data  structure  pointed  to 
by  the  position  in  the  array  in  the  same 
fashion  the  hardware  multipliers  will  do. 


********************************************************************** 


/*  zero  multiplier  cell  */ 

mO  (ptr,  sign_ext) 
struct  multXO  *ptr; 
int  sign_ext; 

{ 

if  (sign_ext  ==  0) 
ptr- >  sumffclk2  =  ptr->prod_in; 
else 

ptr->sumffclk2  =  ptr->sumffclk2; 


ptr->fF2clk2  =  ptr->fflclkl; 
ptr->ff3clk2  =  ptr-  >  ff2clkl ; 


ptr->fflclkl  ==  ptr->fflclk2; 
ptr->ff2clkl  —  ptr->ff2clk2; 
ptr->ff3clkl  =  ptr->fT3clk2; 
ptr->sumffclkl  =  ptr->sumffclk2; 


return; 

} 
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plus  one  multiply  function  */ 

ml  (ptr,  resetjD,  sign_ext) 
struct  multXl  *ptr; 
int  reset_0,  sign_ext; 

{ 

int  add  =  0; 

if  (reset_0  ==  0) 

{ 

ptr- > carry _ffclk2  =  0;  /*  reset  the  carry  flip-flop  to  0  */ 

ptr- >  carry  _ffclkl  =  0; 

} 

/*  things  that  happen  on  clock  two  */ 

add  =  ptr->ff2clkl  +  ptr- >  prod  Jn  +  ptr- >  carry  Jfclkl; 

switch(add) 

{ 

case  0: 

ptr->tmpsum  =  0; 
ptr-  >  carry  _flclk2  =  0; 
break; 

case  1: 

ptr->tmpsum  =  1; 
ptr- >  carry  _ffclk2  —  0; 
break; 

case  2: 

ptr->tmpsum  =  0; 
ptr-  >  carry  _jfclk2  =  1; 
break; 

case  3: 

ptr->tmpsum  =  1; 
ptr- >  carry  _JTclk2  =  1; 
break; 

}  /*  end  case  */ 

if  (sign_ext  !=  1) 
ptr- >sumffclk2  =  ptr->tmpsum: 

ptr-  >  ff2clk2  =  ptr->fflclkl; 
ptr->ff3clk2  =  ptr-  >  ff2clk  1 ; 

*  things  that  happen  on  clkl  */ 

ptr->fflclkl  =  ptr->fflclk2; 
ptr-  >  fT2clk  1  =  ptr->ff2clk2; 
ptr-  ff3clkl  =  ptr->fT3clk2: 


ptr-  >  carry  Jfclkl  =  ptr->  carry  _ffclk2 
ptr->sumffclkl  =  ptr->sumffclk2; 

return; 


plus  two  multiplier  */ 


m2(ptr,  resetjO,  rstdc,  s_extend) 

struct  multX2  *ptr; 

int  resetjD,  rstdc,  s_extend; 

{ 

int  add  =  0; 
int  addin  =  0; 

if  (resetjO  ==  0) 

{ 

ptr-> carry _ffclkl  =  0;  /*  reset  the  carry  flip-flop  to  0  */ 
ptr-> carry _ffclk2  =  0;  /*  reset  the  carry  flip-flop  to  0  */ 
} 

/*  things  that  happen  on  clock  two  */ 

addin  =  (ptr->ff3clkl  &&  rstdc); 

add  =  addin  +  ptr->prod_in  -I-  ptr->carry_ffclkl; 

switch(add) 

{ 

case  0: 

ptr->tmpsum  =  0; 
ptr->  carry  _ffclk2  =  0; 
break; 

case  1; 

ptr->tmpsum  =  1; 
ptr- >  carry  JTclk2  =  0; 
break; 

case  2: 

ptr->tmpsum  —  0; 
ptr->  carry  _jfclk2  =  1; 
break; 

case  3: 

ptr->tmpsum  =  1; 
ptr->  carry  Jffclk2  =  1; 
break; 

}  *  end  case  * 

if  (s_extend  !=  1) 
ptr->sumffclk2  =  ptr->tmpsum: 


ptr-  >fT2clk2  =  ptr- >fflclkl; 
ptr-  ff3clk2  =  ptr-  >  fT2clk  1 : 

*  things  that  happen  on  clkl  *  ' 

ptr-  fflclkl  =  ptr-  >fflclk2; 
ptr-  - fF2clk  1  =  ptr->ff2clk2; 
ptr-  ff3clkl  =  ptr-  >ff3clk2; 
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,  *  minus  one  multiplier  function  */ 


nl(ptr,  reset_l,  s_extend) 
struct  multNl  *ptr; 
int  reset_X,  s_extend; 

{ 

int  add  =  0; 

/*  things  that  happen  on  clock  two  */ 

if  (reset_l  —  =  1) 

{ 

ptr- >  carry  _ffclkl  =  1;  /*  reset  the  carry  flip-flop  to  0*/ 

ptr- >  carry  _ffclk2  =  1; 

} 

add  =  !(ptr->ff2clkl)  +  ptr->prod_in  +  ptr->  carry  _ffclkl; 

switch(add) 

{ 

case  0: 

ptr->tmpsum  =  0; 
ptr- >  carry  _ffclk2  =  0; 
break; 

case  1: 

ptr->tmpsum  =  1; 
ptr- >  carry  _ffclk2  =  0; 
break; 

case  2: 

ptr->tmpsum  =  0; 
ptr- >  carry  _ffclk2  =  1; 
break; 

case  3: 

ptr->tmpsum  =  1; 
ptr- >  carry  _ffclk2  =  1; 
break; 


}  /*  end  case  *! 

if  (s_extend  !=  1) 
ptr-  >sumffclk2  =  ptr->tmpsum; 

ptr->ff2clk2  =  ptr-  >  fflclk  1 ; 
ptr->ff3clk2  =  ptr->ff2clkl; 

*  things  that  happen  on  clkl  * 


ptr-  fflclk  1  =  ptr-  >fflclk2 
ptr-  •  ff2c lk  1  =  ptr-  -ff2clk2 
ptr-  ff3clkl  =  ptr-  ^ff3clk2 


/*  minus  two  multiplier  *  j 


n2(ptr,reset_l,  rstdc,  s_extend) 

struct  multN2  *ptr; 

int  reset_l,  rstdc,  s_extend; 

{ 

int  add  =  0; 
int  addin  =  0; 

if  (reset_l  ==  1) 

{ 

ptr- > carry _jfclkl  =  1;  /*  reset  the  carry  flip-flop  to  0*/ 

ptr- >  carry  _ffclk2  =  1;  j*  reset  the  carry  flip-flop  to  0*/ 

addin  =  !(ptr-  >ff3clkl  &<&  rstdc); 

add  =  addin  +  ptr-  >  prod  Jn  +  ptr-  >  carry  _ffclkl; 

switch(add) 

{ 

case  0: 

ptr->tmpsum  =  0; 
ptr-  >  carry  _ffclk2  =  0; 
break; 

case  1: 

ptr->tmpsum  =  1; 
ptr- >  carry  _ffclk2  =  0; 
break; 

case  2: 

ptr->tmpsum  =  0; 
ptr-  >  carry  JTclk2  =  1; 
break; 

case  3; 

ptr->tmpsum  =  1; 
ptr-  >  carry  _JTclk2  =  1; 
break; 

}  *  end  case  * / 

if  (s_extend  !=  1) 
ptr-  >sumffclk2  =  ptr->tmpsum; 

ptr->ff2clk2  =  ptr- >  fflclkl; 
ptr-  :■  fF3r l k 2  =  ptr-  >ff2clkl; 

*  things  that  happen  on  clkl  * 

ptr-  fflclkl  =  ptr-  •  ff  1  c I k 2 : 
ptr-  >fI2clkl  =  ptr-  (T2clk2: 
ptr-  fT3o!kl  =  ptr-  fT3clk2; 


ptr-  >  carry  _ffclkl  =  ptr->  carry  _ffclk2; 
ptr->sumffclkl  =  ptr- >sumffclk2; 


return; 

} 
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********************************************************************** 


DATE:  12  NOV  1985 

AUTHOR:  Jim  Collins 
FILENAME:  post_wfta.c 
PROJECT:  WFT16  Simulation 
OPERATING  SYSTEM:  UNIX  V  4.2 
LANGUAGE:  C 

USE:  This  program  is  the  third  in  the  series  which  model  the 

16-point  winograd  pipeline.  It  follows  the  multiply .c 
program. 


FILES  READ: 

master_control:  control  word  for  the  processor  per  simulation 
cycle. 

postadd_cntrl:  reset  signal  for  the  carry/borrow  of  the 
postadd  column. 

rmult_out:  output  of  the  real  pass  through  the  serial 

multiplier,  input  to  the  real  postadders. 
imult_out:  output  of  the  imaginary  pass  through  the  serial 

multiplier,  input  to  the  imaginary  postadders. 


FILES  WRITTEN: 

rpostaddljn:  input  to  the  first  columns  of  the  real  postadders. 
rpostadd2_in:  input  to  the  second  column  of  the  real  postadders. 
rpostadd3_jn:  input  to  the  third  column  of  the  real  postadders. 
rprcelLin:  input  to  the  real  parity  round  cell. 
rprcell_out:  output  of  the  real  parity  round  cell. 
rsipo_out:  output  of  real  results. 

ipostaddl_in:  input  to  the  second  column  of  the  imaginary  postadders. 
ipostadd2_in:  input  to  the  second  column  of  the  imaginary  postadders. 
ipostadd3_in:  input  to  the  third  column  of  the  imaginary  postadders. 
iprcell_in:  input  to  the  imaginary  parity  round  cell. 
isipo_out:  output  of  imaginary  results. 


**  FILES  INCLUDED: 

**  fn_add.c:  addition  function 

**  postdec:  type  and  structure  declarations  for 

**  the  program. 

*  * 

************************************************************************ 

#include  <stdio.h> 

#include  ”fn_add.c” 

^include  "postdec” 

#define  clk_cycle  32  *  16  point  wfta  cycle  * ; 

typedef  struct  add_cell 
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{ 

int  x_I; 

int  y _ 1 ; 

int  c _ 1 ; 

int  b_I; 

}; 

typedef  struct  add2_cell 

{ 

int  sum; 
int  diff, 
int  co; 
int  bo; 

}; 

main() 

{ 

FILE  *ap,*bp,*cp,*dp,*ep,*gp,*hp,*ip,*jp,*kp,*lp,*mp, 

*np,  *op,  *pp,  *qp,  *fopen(); 

'  *  initialize  the  pointers  for  the  adders*/ 

for  (i  =  0;  i  <=  8;  i++) 

{  p_r_Ii[i]  =  &r_col_lin[i]; 
p_r_lo[ij  =  &r_coL_lout(i]; 
p — i — li[i]  =  &i_col_lin[i]; 
p _ i _ lo[i]  =  &i_col_lout(i];  } 

for  (i  —  0;  i  <=  3;  i++) 

{  p_r_2i[i]  —  &r_col_2in[i]; 
pjr_2o[i]  =  &r_col_2out[i]; 
p_j_2i(i]  =  &i_col_2in[i]; 
p_i_2o[i]  =  <fci_col_2out[iJ;  } 

for  (i  =  0;  i  <=  6;  i++) 

{  p_r_3i[i]  =  &r_col_3in[i]; 
p_X_3o[ij  =  &r_col_3out[i); 
p_j_3i  [i]  =  &i_col_3in[ij; 
p_i_3oji]  =  &i_col_3out(i|;  } 

*  open  the  files  for  the  control  words,  and  input  data  and  output  * 

ap  =  fopen("master_contror,  ’V’ ) ; 
bp  =  fopen(’'postadd_cntrr,'’r”); 
cp  =  fopen(rlrmult_out”,  "V’); 
dp  =  fopen(”imult_out’Vr”); 
hp  —  fopen(”rpostaddI_in’/’w'’); 
jp  —  fopen(-,rpostadd2_jn'\'’w'’); 
ip  =  fopen(”rpostadd3Jn”.”w”); 
kp  =  fopen(”rprcell  Jn’’ ,’’w” ); 
gp  =  fopen(”rprcell_out”  .’’w  ’ ); 
lp  =  fopen(”rsipo_out”,”w”); 
np  =  fopen(”ipostaddlJn"  w”  ); 
op  =  fopen('’ipostadd2_jn'\"w" ); 
pp  =  fopen(”  ipostadd3Jn"  w"  ); 
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qp  =  fopen(”iprcell_in  ,”w”); 
mp  =  fopen(”isipo_out”,’’w’’); 


fscanf(ap,”%d”,&cycles);  /*  number  of  full  cycles  which  the  program 

will  simulate  */ 

while  (clk_count  <=  cycles) 

{ 

fscanf(ap,”%d”,&clk_count);  /*  read  master  control  clock  cycle  */ 

elk  =(clk_ count  %  clk_cycle); 
clk_jnt  =(clk_int  %  clk_cycle); 

/*  check  to  see  if  internal  and  master  clocks  are  sychronized  */ 
if  (elk  !=clk _ int) 

{  printf(”  clocks  are  not  synchronized  %d0,  elk); 
exit();  } 

i*  read  all  the  control  signals  */ 

for  (i— 0;  i  <  =19;  i++) 
fscanf(ap,”%d”  ,&flags[i]); 

'*  assign  control  variable  to  be  used  in  this  program  * / 

sr_sipo  =  flags[l4|; 

Lsipo  =  flags[13); 
sd_sipo  =  flags)  12]; 
p_calc  =  flags  (5); 
p —append  =  flags(6j; 
r_calc  =  flags[4]; 


fscanf(bp,”%d”  ,&clk_add); 
for  (i  =  0;  i  <  =  2;  i++) 
fscanf(bp,”C!fcd’,,<&rst_bit(ij); 


/*  read  adder  reset  signals  */ 


***********************  POST  ADD  MODULE  1  ********************************** 


*  call  fn_add.c  to  add  bit  stream*/ 

for(i  =  0;  i  <=  8:  i+  +  ) 

{ 

add(p_jr_li[ij,  p_r_lo[i]); 
add( p _ i _ 1  i [i i ,  pJ_loii]); 

} 

*  The  MSFFs  in  the  postadder  are  reversed,  data  enters  through  phi _ 1  and 

leaves  through  phi _ 2,  assign  to  output  latch  of  the  MSFF  * 

for  (i  —  0:  i  •  =  1:  i  +  — ) 

{  rpostadd!ii.fclk2  =  rpostaddlij.fclkl; 
ipostadd  i  fclk'J  =  ipostaddjii.fclkl;  } 


/******************•****  POST  ADD  MODULE  2  ********************************, 

for(i  =  0;  i  <=  3;  i++) 

add(p_r_2i[i],  p_r_2o[i]); 
add(pj_2i[i],  p_i_2o(i]); 


/*  seven  MSFFs  in  the  second  column  */ 

for(i  =  0;  i  <  =  7;  i++) 

{  rpostadd2(i].fclk2  =  rpostadd2[i].fclkl; 
ipostadd2[i].fclk2  =  ipostadd2[i].fclkl;  } 

/************,**,*******  p0ST  module  3  ***********»***********.*****»**/ 

for(i  =  0;  i  <=  6;  i++)  /*  same  as  first  column  */ 

add(p_r_3i(i],  p_r_3o[i]); 
add(p_i_3i[i],  pJ_3o[i]); 


for(  i  =  0;  i  <=  1;  i++) 

{ 

rpostadd3[i].fclk2  =  rpostadd3fi].fclkl; 
ipostadd3(i].fclk2  =  ipostadd3[i].fclkl; 


.'****•*•**»»•«,**,,,*,*  PARITY  ROUND  CELL  ********************************** 

i  *  move  bits  through  the  parity  round  cell,  both  real  and  imaginary  */ 

for  (  i  =  0;  i  <=  15;  i+  +  ) 

(r_cell  i  ).and_out  =  (r — cell[i]).clkl  &  (r_cell[i]).r.fclkl; 

(r_cell  i  ).r_or  =  (r_cell[i]).and_out  |  !r_calc; 

(r — cell  i  ).in_xor  =  (r_cell[i]).clkl  *  (r_cell[i]).r.fclkl; 

(r_cell  i  ).p_xor  =  (r_cell[il).in_xor  *  (r_cell[i]).p.fclkl; 

(r_cell  i  ).p_or  =  (r_cell(ij).p_xor  |  !p_calc; 

( r — .cell  i  ).p.fclk2  =  (r_cell[i]).p_or; 

(r_cell  i  ).r.fclk2  =  (r_cell|ij).r_or; 

/*  check  control  signals  for  parity  cell*  / 

if  (p_append  ==  0) 

(r_cell[i|).clk2  =  (r_cell[ij).in_xor; 
else 

(r_ce!lli]).clk2  =  (r_cell[i]).p.fclkl; 

(r_cell[ii  j.p.fclk  1  =  (r_cellli]).p.fclk2; 

(r_cell[ij).r.fclkl  =  (r_cell(ij).r,fclk2; 

(i_cell[ij).and__out  =  (i_cell[i]).clk  1  &  (i_ceil[il).r.fclkl ; 

(i_cell!i]).r_or  —  (i_cell[il).and_out  |  !r_calc; 

( ' — cell  ij ) .in — xor  =  ( i_cell [i] ) .elk  1  ‘  (i_cell!ii).r.fclkl : 

(i_cell!il).p_xor  =  (i_cell[il).in_xor  '  (i_ceil'ii).p.fclkl; 

(i_cell;ii).p_or  =  (i_cell[ij).p_xor  |  !p_calc; 


(i_cellfi]).p.fclk2  =  (i_cell[i]).p_or; 
(i_cell[ij).r.fclk2  =  (i_cell[i]).r_or; 

if  (p_append  -  0) 

(i_cell[i]).clk2  =  (i_cell[i]).in_xor; 
else 

(i_cell[i]).clk2  =  (i_cell[i]).p.fclkl; 

(i_cell[i]).p.fclkl  =  (i_cell(i]).p.fclk2; 
(i_cell[i]).r.fclkl  =  (i_cell[i]).r.fclk2; 


j  ******************  gJpQ  Q£LL  ***************************************** 

/*  shift  the  data  right  in  the  serial  path  */ 

if  (sr_sipo  ==  1) 

{ 

for  (i  =  15;  i  >  =  0;  i~) 
for  (j  =  23;  j  >=  0;  j-) 
if  (j  ==  23) 

{  r_sipo[i][j].s_clk2  =  r_phil_latch[i]; 

i_sipo[i][j].s_clk2  =  i _ phil _ la.tch[i] ;  } 

else 

{  r_sipo[i](j].s_clk2  =  r_sipo[i][j  +  l).s_clkl; 
i_3ipo[i] [j] .s_clk2  =  i_sipo(i][j+l].s_clkl;  } 

} 

/**************************************************************** 

I  / 

/*  latch  data  from  the  serial  path  into  the  parallel  path  */ 

if  (l_sipo  ==  1) 

{ 

for  (i  =  15;  i  >=  0;  i~) 
for  (j  =  23;  j  >  =  0;  j~) 

{  r_sipo[i)[j].p_clk2  =  r_gipo[i] [j] .s_clk  1 : 
i_sipo[i][j].p_clk2  =  i_sipo(i](j].s_clkl;  } 

} 

**************************************************************** 

*  shift  the  data  down  in  the  parallel  path  */ 

if  (sd_sipo  ==  1) 

{ 

for  (i  -  15;  i  >—  0;  i--) 
for  (j  =  23;  j  >=  0;  j--) 
if  (i  ==  15); 
else 

{  r_sipo[ij[jj.p_clk2  =  r_sipo[i+ 1  j  [j  j  .p_clk  1 ; 
i_sipoii][j].p_clk2  =  i_3ipoli+ll[jj.p_clkl;  } 


*************  CLOCK  ONE  OCCURENCES  ADD  COLUMN  ONE  ********* 
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*  Assign  input  multiplier  results  to  the  x  and  y  variables  within 
the  adder  data  structures  indices  on  the  left  are  inputs  and  outputs 
using  the  Taylor  numbering  system  */ 

(p_r_li(Oj)->x_l  =  rmult[0j;  /*  tOO  hOO  */ 


(p_r_li[0])-  >y_l  =  rmult[l]; 

(p_r_li[l])->x_l  =  rmult[3j; 
(p_r_li[lj)->y_l  =  rmult[8]  ; 

( P_r_J-  i  [2] )-  >  x_l  =  rmult[l3]; 
(p _ T _ li[2])-  >  y _ 1  =  rmult[6]; 

(p_r_Ji[3])->x_i  =  rmult[4j; 
(p—Jf _ Ii[3])-  >  y _ 1  =  rmult[9]; 

(p_r._ li[4])~  >x _ 1  =  rmultjll); 

(p_r_li[4])->y_l  =  rmult[lO]; 

(p_r_Ji[5])->x_l  =  rmult(l2] ; 
(p _ r _ Ii[5])-  >  y _ I  =  rmult(lO); 

(p_r_Li[6])->x_l  =  rmult[7]; 
(p _ r _ li  [6] )-  >  y _ 1  =  rmult[l4]; 

( P__T__1  i  [7] )-  >  x_l  =  rmult[l5]; 
( P — T — l  i  [7] )-  >  y _ 1  =  rmult[16]; 

{ P — r _ I  i  [8j)~  >  x _ 1  =  rmult[l5]; 

(p — T _ li  [8] )-  >  y _ 1  =  rmult[17j; 

rpostadd  [0] .  fclk  1  =  rmult[2]; 
rpostaddflj.fclkl  =  rmult[5]; 


/*  tOO  hOO  */ 
/*  tOl  h08  */ 

/*  t03  tlOO  */ 
/*  t05  tlOl  */ 

/*  tl3  tl02  */ 
/*  til  tl03  */ 

/*  t04  tl04  */ 
/*  t06  tl05  */ 

/*  t08  tl06  */ 
/*  t07  */ 

/*  t09  */ 

/*  t07  tl07  */ 

/*  tl2  tl08  */ 
/*  tl4  tl09  */ 

/*  tl5  tllO  */ 
/*  tl5  */ 

j*  tl5  */ 

/*  t!7  tllO  */ 


/*  t02  */ 
/*  tlO  */ 


***********************  SECTION  ************************/ 


( P — i — 1  i  [0) )-  >  x _ 1 

(pj_li[0])->y_l 

(pj_li(l])->x_l 
( P  J — 1  i  [  1  j )-  >  y  — 1 

( P — i _ 1  i  [2j )-  >  x _ 1 

(p — i — li[2] )-  >y — 1 

( p _ i _ 1  i  [3j )-  >  x _ 1 

( P  J — l  i  [3] )-  >  y — I 

( P_i — 1  if4j )-  >x_l 
(P— * — li[4] )-  >y_l 

( p _ i _ 1  i [5] )-  >x_l 

(p_i_li[5|)-  >  y_l 


imultfO]; 
imultjlj  ; 

imult[3]; 

imult(8]; 

imult[  13] ; 
imult[6] ; 

imult[4l ; 
imult[9j; 

imultf  111; 
imultilO]: 

imult[l2|; 
imultf  101 ; 


( p _ i _ l i  6])-  >x_l  =  imult[7l; 


/*  uOO  hOO  */ 

/*  uOl  h08  *! 

/*  u03  ulOO  */ 

/*  u05  ulOl  * 

*  ul3  ul02  * 
/*  ull  ui03  * 

/*  u04  ul 04  * 

*  u06  u!05  * 

*  u08  ulOB  * 

*  u07  * 

*  u  12  * 

*  u07  ul07  * 

*  ul2  ul08  * 
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(pJ_li(6])->y_I  =  imult[14); 


/*  ul4  u!09  */ 


(p i li(7] )- > x 1  =  imult[l5];  /*  ul5  ullO  */ 

(p_i_li[7])->y_I  =  imult[  16] ;  /*  ul5  */ 

(p_i_Ii(8])->x_l  =  imult(15];  /*  ul5  */ 

(p_i__li[8] )- > y _ 1  =  imult(l7];  /*  ul7  ullO  */ 

ipo6tadd[0].fclkl  =  imult[2];  /*  u02  */ 

ipostaddflj.fclkl  =  imult[5];  /*  ulO  */ 

for  (i  =  0;  i  <=  8;  i++)  /*  move  carry,  borrow  to  CLKl  latches  */ 

{  (p _ r _ li[i] )-  >  c _ 1  =  p_r_lo[i]->co; 

(p_r_li[i  )->b_l  =  p_r_lo[i]->bo; 

(p_i_li  i  }->c_l  =  pJ_lo[i]->co; 

(p _ i _ li  i  )->b_l  =  p_i_lo[i]->  bo;  } 

/*  if  reset  high  reset  the  carry  and  borrow  */ 

if  (rst_bit[0]  -==  0) 

for  (i  =  0;  i  <  =  8;  i++) 

{(p_r_lo(i])->co  =  0; 

(p_r_lo(i])->bo  —  0; 

(p_r_li[i])->c_I  =  0; 

(p_T_li[i])->b_I  =  0; 

(p _ i _ Io[i] )-  >  co  =  0; 

(p _ i _ lo[i])-  >  bo  =0; 

(p _ i _ li  i  )-  > c _ 1  =0; 

(pJ_Ii  i  )->b_l  =0;} 


fscanf(cp,  ”%d”,  &clk_real); 
fscanf(dp,  ”%d”,  &clk_add); 
for  (i  =  0;  i  <=  17;  i++) 

{  fscanf(cp,”%d”,&rmult[i]); 
fscanf(dp,”%d”,&imult[i]); 

} 

if  (clk_count  >=  79) 


/*  read  input  data  from  multiplier  */ 


/*  first  input  not  expected  until 
clock  79  *  / 


fprintf(hp1-’  %d  v  ,clk_count);  /*  print  real  and  imaginary  inputs 

to  the  output  files  */ 

for  (  i  =  0.  i  <  =  8;  i  +  +  ) 

fprintf(hp,-’cfod  %d  ” ,(p_r_lijij)- >x_l .  ( p_r _ I i [ i ] )- > y _1 ) ; 

fprintf(hp,”eod  ^od  ' ,  rpostadd]6].fclkl,  rpostaddjlj.fclkl); 

fprintf(np.’’  ^d  "  ,clk_count); 
for  (  i  —  0:  i  •  =  8;  i  +  -*-) 

fprintf(np,“°od  ^d  ’’,(pj_lilij)->x_l,  (pj_li(i])- >y_l); 
fprintf(np,  '^od  ^d  ”,  ipostaddjOj.fclkl,  ipostaddlll.fclkl);  } 

************  CLOCK  ONE  OCCLFRENCES  ,\DD  COLUMN  TWO  ************ 

(p_r_2il0|)-  x_l  =  (p_r_lo[3|)- >sum;  *  1 10-4  t200  * 


(p_r_2i  0])->y_I  =  (p_r_lo[4  )- >diff; 

/*  tl06  t201  */ 

(p_r_2i  1  j )-  >  x_I  =  (p_r_lo[3  )->diff; 

/*  tl05  t202  */ 

(p_r_2i  1  )->y_l  =  (p_r_lo[5  )->diff; 

/*  tl07  1203  */ 

(p_r_2i  2  )->x_l  =  (p_r_lo[6  )->sum; 

/*  tl08  t204  */ 

(p_r_2i  2  )->y_l  =  (p_r_lo[7  )->sum; 

/*  tllO  t205  */ 

(p_r_2i  3  )->x_l  =  (p_r_Io[6  )->diff; 

/*  tl09  t206  */ 

(p_r_2i  3  )-  > y _ 1  =  (p_r_Io[8  )->diff; 

/*  till  t207  */ 

rpostadd2[0j.fclkl  =  rpostadd[0].fclk2; 

/*  t02  */ 

rpostadd2  lj. fclkl  =  (p_r_io[0  )->sum; 

/*  hO  */ 

rpostadd2  2] .fclkl  =  (p_r_io[0  )->diff; 

/*h8  */ 

rpostadd2  3] .fclkl  =  (p_r_Io[l  )->sum; 

/*  tioo  */ 

rpostadd2  4  .fclkl  =  (p_r_lo[l  )->diff; 

/*  tlOl  */ 

rpostadd2  5  .fclkl  =  rpostadd[lj.fclk2; 

/*  tio  */ 

rpostadd2  6  .fclkl  =  (p_r_lo(2  )->sum; 

/*  tl02  */ 

rpostadd2  7  .fclkl  =  (p_r_lo(2  )->diff; 

/*  tl03  */ 

/***********************  IMAGINARY  SECTION  ************************  j 

(p_ i_ 2i[0  )->x_J  =  (p_i_lo[3j)->sum; 

/*  tl04  t200  */ 

(pJ-_2i[0  )->y_l  =  (p_i_lo  4j)->diff; 

/*  tl06  t20l  */ 

(p_i_2i|l  )->x_l  =  (pJ_lo  3))->diff; 

/*  tl05  t202  */ 

(p_L2i(l  )->y_l  =  (pJ_lo  5])- >  diff; 

/*  tl07  t203  */ 

(p_i_2i[2  )->x_J  =  (pJ_lo  6])->sum; 

/*  tl08  t204  */ 

(p_L2i[2  )- >y _ 1  =  (p_i_lo  7|)->sum; 

/*  tllO  t205  */ 

( p — i — 2i(3  )->x_l  =  (p_i _ lo  6])->diff; 

/*  tl09  1206  */ 

(p_L2i(3  )->y_l  —  (p_i_lo(8j)->diff; 

/*  till  t207  */ 

ipostadd2[0], fclkl  =  ipostadd[0].fclk2; 

/*  t02  */ 

ipostadd2[lj. fclkl  —  (p_j_lo[0])^  >sum; 

/*  hO  */ 

ipostadd2[2], fclkl  =  (pj_lo[0])-  >  diff; 

/*  h8  */ 

ipostadd2[3j. fclkl  =  (pJ_lo(l|)->sum; 

/*  tioo  */ 

ipostadd2  (4 1 .  fc  Ik  1  =  (p_j_JLo[lj)-  >  diff; 

/*  tlOl  */ 

ipostadd2[5], fclkl  =  ipostadd[l].fclk2; 

/*  tio  */ 

ipostadd2[6], fclkl  =  (p_j_lo(2))-  >sum; 

/*  U02  */ 

ipostadd2(7j. fclkl  =  (pJ_lo[2j)-  >  diff; 

/*  tl03  */ 

o* 

-1 

II 

o 

A 

II 

co 

+ 

j*  shift  carrv.  borrow  on 

phi _ 1  pulse  * 

{  ( p__r_2i[i] )-  >c _ 1  —  (p_r_2o[ij)->co; 

( P — r_2i[i] )-  >  b _ 1  =  (p_r_2ojij)- >bo. 

(pJ_2i[i])->c_J  =  (p_i_2o[ij)-  >co; 

(pj_2i[ij)->b_l  =  (pJ_2o(i))->bo;  } 

if  ( rst _ bit!  1 1  ==  0)  /*  if  reset  high  reset  the  carry  and  borrow  * 

for  (i  =  0;  i  <=  3;  i-e+) 

{(p_r_2ojij)->co  =0; 

( p _ r_2o[i] )-  >bo  —  0; 

(p_r_2i[ij)->c_l  =  0; 

(p_r_2iji|)->b_l  —0; 

(p _ i _ 2oiii)-  >co  =0; 

(pJ_2ojij)-  >bo  =0; 

(pJ_2iiij)-  >c_l  —  0; 
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(P— i — 2i[i] )-  >  b — 1  =  0;} 


if  (clk_count  >=  80)  /*  print  real  and  imaginary  inputs 

which  are  the  outputs  of  columns  two  */ 

{ 

fprintf(jp,”  %d  ”  ,clk_count); 
for  (  i  =  0;  i  <=  3;  i++) 

fprintf(jp,”%d  %d  ”,(p_r_2i[i])->x_l,  (p_r_2i[i])->y_l); 
for  (i  =  0;  i  <==  7;  i++) 
fprintf(jp,”  %d” ,  rpostadd2[i].fclkl); 

fprintf(op,”  %d  ”,clk_count); 
for  (  i  =  0;  i  <=  3;  i++) 

fprintf(op,”%d  %d  ” ,(p_i_2i[i] )- >x_I ,  (pj_2i[i])->y_l); 
for  (i  =  0;  i  <=  7;  i++) 
fprintf(op,”  %d”,  ipostadd2[i].fclkl); 

} 

************  CLOCK  ONE  OCCURENCES  ADD  COLUMN  THREE  ****************** 

/*  assign  output  of  column  two  adders  both  real  and  imaginary 
to  the  input  of  column  3  */ 

(p_r_3i  0  )->x_l  =  (p_r_2o(0])->sum;  /*  t200  */ 

(p_r_3i  1  )-  > x _ 1  =  rpostadd2[3].fclk2;  /*  tlOO  */ 

(p_r_3i  2  )->x_L  =  (p_r_2o[l])->diff;  /*  t203  */ 

( p_J*_3i  3  )->x_l  =  rpostadd2 [0] . f c lk2 ;  /*  t02  */ 

(p_r_3i  4  )->x_l  —  (p_r_2o[l])->sum;  /*  t202  */ 

(p_r_3i  5  )->xJL  —  rpostadd2[4].fclk2;  /*  tlOl  */ 

( p — r_3i(6  )->x_l  =  (p_x_2o[0))- > diff;  /*  t20I  */ 

*  in  the  real  case,  the  imaginary  term  gets  assigned  to  the  y  variable  */ 

( p__r_3i [0J )-  > y _ 1  =  (pJ_2o[2])->sum;  /*  u204  */ 

( p_r_3i [  1  j )- > y _ 1  —  ipostadd2[6].fclk2;  /*  ul02  */ 

(p_r_3i[2))- > y _ l  =  (pJ_2o[3j)->diff;  /*  u207  */ 

(p _ r_3i[3j)- > y _ 1  =  ipostadd2(5].fclk2;  /*  ulO  */ 

( p — r_3i [4 j )- > y _ 1  =  (pJ_2o[3])->sum;  /*  u206  */ 

( p_r_3i[5j )- > y _ 1  =  ipostadd2(7].fclk2;  /*  ul03  */ 

(p_r_3i[6])->y_jl  =  (p_i_2o{2] )- > diff;  /*  u205  */ 

***********************  IMAGINARY  SECTION  ************************  < 


(p_i_3i(0j)-  >  x_i  =  (pj_2o[0j)->sum; 
(p_j_3i!l])->x_l  =  ipostadd2[3].fclk2; 

( p  J_3i  [2j )-  >  x _ 1  =  (p_i_2o[l|)->di(T; 

(p_j_3i  3])->x_l  =  ipostadd2[0]  fclk2: 
{ p_i_3i  4|j-  > x_l  =  (p_i_2o[l])-  >sum; 
(p J_3i  5|)- >x_l  =  ipostadd2[4].fclk2; 
( p _ i_3i  6] )-  >  x _ I  =  (p_i_2oj0|)->di(T; 


in  the  imaginary  case,  the  real  term  is  assigned  to  the  x  term  * 


(pj— 3i[0  )->y_l  =  (p_r_2o[2])->sum; 
(p_i_3i  1  )->y_l  =  rpostadd2[6].fclk2; 
(pj_3i  2  )->y_l  =  (p_r_2o[3])->diff; 
(p_i_3i  3  )->y_l  =  rpostadd2[5].fclk2; 
(pj_3i  4  )->y_L  =  (p_r_2o[3j)->suni; 

(p__i_3i  5  )->y _ I  =  rpostadd2[7j.fclk2; 

(p — i — 3i  6  )->y_I  =  (p_r_2o[2])->diff; 

rpostadd3(0j.fclkl  =  rpostadd2(lj.fclk2; 
rpostadd3(lj.fclkl  =  rpostadd2[2].fclk2; 

ipostadd3[0].fclkl  =  ipostadd2[l].fclk2; 
ipostadd3(lj.fclkl  =  ipostadd2[2]  ,fclk2; 

for  (i  =  0;  i  <=  6;  i++) 

{  (p_r_3i [i] )-  >  c — 1  =  (p_r_3o[i])->co; 
(P— T— 3i[i  )->b_I  =  (p_r_3o[i])->bo; 

(p — i — 3i[i  )->c_l  =  (pJ_3o[i])->co; 

(p — i — 3i[i  )-  >  b _ I  ==  (p _ i _ 3o[i] )-  >  bo;  } 


if  (rst_bit[2j  ==  0)  /*  reset  carry/borrow  */ 

{  for  (i  =  0;  i  <=  6;  i++) 

{  (p_jr_3o[i])->co  =  0; 

(p_r_3o[i])->bo  =  0; 

(p — r_3i[i] )-  >  c — 1  =  0; 

(p — r_3i[ij )-  >  b _ 1  =  0; 

(pJ_3o[i])->co  =  0; 

(p_i_3o[i|)->bo  =  0; 

(pJ_3i[i])->c_I  =  0; 

(pj_3i[i])->b_l  =  0;  }} 


if  (clk_count  >=81)  /*  print  results  */ 

fprintf(ip,”  %d  ”,clk_count); 
for  (  i  =  0;  i  <=  6;  i++) 

fprintf(ip,”%d  %d  ”,p_r_3i[i]->x_l,  p_r_3i[i]->y_l); 
for  (i  =  0;  i  <=  1;  i++) 
fprintf(ip,”c^d  ”,  rpostadd3[i].fclkl); 

fprintf(pp,”  ^od  ”,clk_count); 
for  (  i  =  0;  i  <=  6;  i++) 

fprintf(pp,”°iod  %d  ” ,pj_3i[i)->x_l,  pj_3i[il- >y_l ); 
for  (i  =  0;  i  <=  1;  i  +  +) 
fprintf(pp,”°fcd  ”,  ipostadd3[i) .fclkl ); 


*.*»..***..«**,*********«  PARITy  ROUND  CELL  INPUT 

*  assignments  to  the  phil  Jatch  in  the  pr_cell  */ 

r_cell[0|.clkl  =  rpostadd3[0].fclk2; 
r_cell  [  1  j  .elk  1  =  (p_r_3o[0])^>diff; 
r_cell[2|.clkl  =  (p_r_3ojlj)- >diff; 


r_cell  3].clkl  = 
r_cell  4]. clkl  = 
r_cell  5].clkl  = 
r_cell  6]. clkl  = 
r_cell  7], clkl  = 
r_cell  8j.clkl  = 
r_cell  9]  .clkl  = 
r_cell  10  .clkl 
r_cell  11  .clkl 
r_cell  12  .clkl 
r_cell  13  .clkl 
r_cell  14  .clkl 
r_cell  15  .clkl 

i_cell  0  .clkl  = 
i_cell  1  .clkl  = 
i_cell  2  .clkl  = 
i_cell  3  clkl  = 

i _ cell  4  .clkl  = 

i_cell  5  clkl  = 
i_cell  6  .clkl  = 
i_cell  7  .clkl  = 
i_cell  8  clkl  = 

i _ cell  9  .clkl  = 

i_cell  10  .clkl 
i_cell  11  .clkl 

i _ cell  12  .clkl 

i_cell  13  clkl 
i_cell(l4  .clkl 
i_cell  15  clkl 


=  (p_r_3o[2  )->sum; 

=  (p_r_3o  3  )->diff; 

=  (p_r_3o  4  )-  >  diff; 

=  (p_r_3o  5  )->diff; 

=  (p_r_3o[6j)->sum; 

=  rpostadd3[l].fclk2; 
=  (p_r_3o[6j)->diff; 

=  (p_r_3o[5])->sum; 
=  (p_r_3o[4])->sum; 
=  (p_r_3o[3  )->sum; 
=  (p_r_3o[2  )-  >  diff; 
=  (p_r_3o[l  )->sum; 
=  (p_r_3o[0  )->sum; 

=  ipostadd3(0j.fclk2; 

=  (p_i_3o[0  )->sum; 

=  (pJ_3o[l  )-  >sum; 

=  (pJ_3o[2  )-  >  diff; 

=  (p_i_3o[3  )->sum; 

=  (p_i_3o[4  )->sum; 

=  (pJ_3o[5  )-  >sum; 

=  (p_i_3o(6  )-  >  diff; 

=  ipostadd3[l].fclk2; 

=  (pJ_3o[6])->sum; 
=  (pJ_3o[5  )-  >  diff; 
=  (pJ_3o[4  )->diff; 
=  (pJ_3o[3  )->diff; 
=  (p_j_3o[2  )->sum; 
=  (p_j_3o[I  )->diff; 
=  (pJ_3o[0  )-  >  diff; 


if  (clk_count  >=  82) 

{ 

fprintf(qp,  ”  %d”,  clk_count); 
fprintf(kp,  ”  %d” ,  clk_count); 
for  (i  =  0;  i  <  =  15:  i++) 

{ 

fprintf  (kp,”  %d  ”,  (r_cell[i]).clkl); 
fprintf  (qp,”  %d  ”,  (i_cell[i|).clkl); 

} 

} 

*********  PHI  ONE  LATCH  BETWEEN  PARITY  AND  SIPO  CELL  *********** 

*  This  latch  returns  the  pipeline  to  its  normal  configuration. 
phi_2  leading,  phi _ 1  trailing  */ 

for  (  i  =  0;  i  <  =  15:  i+  +  ) 

{  r_phil Jatchlii  =  r_cell[i|.clk2; 
i_phil Jatch(ij|  =  i_cell[ij.clk2;  } 

if  (clk_count  >=  83) 

{ 

fprintf(gp,”  °cid  ”,  clk_count); 


fprintf(gp,”  %d  ”,  r_phil Jatchfi]); 


***********************  SIPO  CLOCK  ONE  OCCURENCES  *********************  j 

/*  shifting  from  phi_2  to  phijl  */ 

for  (i  =  15;  i  >  =  0;  i~) 
for  (j  =  23;  j  >=  0;  j-) 

{  r_gipo(i](j].s_clkl  =  r_sipo[i][j].s_clk2; 
r_sipo[i)[jj.p_clkl  =  r_gipo[ij[jJ.p_clk2; 
i_sipo[i][j].s_clkl  =  i_sipo[i][j].s_clk2; 
i_sipo(i][jj.p_clkl  =  i_sipo[i][j].p_clk2;  } 


********************************************************************** j 

if  (clk_count  >=  117)  /*  write  data  out  of  SIPO  */ 

{ 

fprintf(lp,”  ”)  ; 
fprintf(mp,”  ”)  ; 
for  (j  =  23;  j  >  =  0;  j~) 

{  fprintf(lp,”  %d  ”,r_sipo[0][j].p_clkl); 
fprintf(mp,”  %d  ”,i_sipo(0][j].p_clkl);  } 

} 


clkjnt  +=1; 


/*  increment  the  internal  counter  */ 


fclose(ap); 

fclose(bp); 

fclose(cp); 

fclose(dp); 

fclose(ep); 

fclose(gp); 

fclose(hp); 

fclose(ep); 

fclose(kp); 

fclose(ip); 

fclose(lp); 

fclose(mp); 

fclose(np); 

fclose(op); 

fclose(pp); 

fclose(qp); 

} 


/*  end  loop  */ 

/*  close  all  files  */ 


« *■»  <*.  ^  -  *,  -  *  -  ■ .  •  ow  v  p>,- 
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0.1. 

Decimal 


*************  *********************************************  •:.********** 
DATE:  15  SEP  1985 

TITLE:  decimal  to  binary  conversion  program. 

FILENAME:  bin.c 
COORDINATOR:  Jim  Collins. 

PROJECT:  WFT16  SIMULATION 
USE:  Converts  decimal  input  numbers  into  their  binary 
representation.  Reads  the  input  file  wfta_in  and 
produces  the  binary  output  in  file  test_piso  (the  name 
of  the  input  file  for  the  simulation  program. 


*********** *********************************************************** 

main  () 

{ 

FILE  *fp,  *gp,  *fopen(); 
int  j,  bit[24|,  mask; 
long  num,  k; 
int  sum  =  0; 
int  x; 


fp  =  fopen(”wfta_Jn”,”r” ); 
gp  =  fopen(”test_piso”,”w”); 

fscanf  (fp,”%ld”,  &k); 

while  (k  !=  -1) 

{ 

num  —  k; 

for  (j  =  23;  j  >=  0;  j-) 

{ 

/*  The  numbers  are  converted  using  a  shift  and  add  function.  The  masks, 
i.e.  MASK00,  are  not  included  in  this  file  for  reasons  of  space.  */ 


switch  (j) 

{ 

case  0:  bit[j]  =  (k  &  MASK00)  >  >  j; 

case  1:  bitjjj  =  (k  &  MASK01)  >  >  j; 

case  2:  bitjjj  =  (k  <fc  MASK02)  >  >  j; 

case  3:  bitjjj  =  (k  &  MASK03)  >  >  j; 

case  4:  bitjjj  =  (k  &  MASK04)  >  >  j; 

case  5:  bitjjj  =  (k  &  MASK05)  >  >  j; 

case  6:  bitjj]  =  (k  Ac  MASK06)  >  >  j; 

case  7:  bitjjj  =  (k  &  MASK07)  >  >  j; 

case  8:  bitjjj  =  (k  &  MASK08)  >  >  j; 

case  9:  bitjjj  =  (k  &  MASK09)  >  >  j; 

case  10:  bitjj]  =  (k  &  MASK10)  >>  j; 

case  11:  bitjj]  =  (k  &  MASKll)  >>  j: 

case  12:  bitjjj  =  (k  &  MASK12)  >>  j; 
case  13:  bitjj]  =  (k  &  MASK13)  >  >  j: 

case  14:  bitjj]  =  (k  &  MASK14)  >  >  j; 


break; 

break; 

break; 

break; 

break; 

break; 

break; 

break; 

break; 

break; 

break 

break 

break 

break 

break 


case 

15 

bit[j 

= 

(k 

& 

MASK15) 

> 

> 

j; 

break 

case 

16 

bitfj 

= 

(k 

& 

MASK16) 

> 

> 

j; 

break 

case 

17 

bit[j 

= 

(k 

& 

MASK17) 

> 

> 

j; 

break 

case 

18 

bitfj 

= 

(k 

& 

MASK18) 

> 

> 

j; 

break 

case 

19 

bitjj 

= 

(k 

& 

MASK  19) 

> 

> 

j; 

break 

case 

20 

bitfj 

— 

(k 

& 

MASK20) 

> 

> 

j; 

break 

case 

21 

bitfj 

= 

(k 

& 

MASK21) 

> 

> 

j; 

break 

case 

22 

bitfj 

= 

(k 

& 

MASK22) 

> 

> 

j; 

break 

case 

23 

bitfj 

= 

(k 

& 

MASK23) 

> 

> 

j; 

break 

case 

24 

bitfj 

= 

(k 

& 

MASK24) 

> 

> 

j; 

break 

case 

25 

bitfj 

= 

(k 

& 

MASK25) 

> 

> 

j; 

break 

case 

26 

bitfj 

= 

(k 

& 

MASK26) 

> 

> 

j; 

break 

case 

27 

bitfj 

= 

(k 

& 

MASK27) 

> 

> 

j; 

break 

case 

28 

bitfj 

= 

(k 

<fc 

MASK28) 

> 

> 

j; 

break 

case 

29 

bitfj 

= 

(k 

& 

MASK29) 

> 

> 

j; 

break 

case 

30 

bitfj 

= 

(k 

& 

MASK30) 

> 

> 

j; 

break 

case 

31 

bitfj 

= 

(k 

Si 

V1ASK31) 

> 

> 

j; 

break 

}  /*  end  switch  * 

}  /*  end  for  loop  * 

sum  =  0; 

for(j  =  0;  j  <=22;  j-*--»-) 
sum  =  sum  +  bit[jj; 

*  odd  parity  requires  that  the  number  of  ones  in  the  data  word  be  odd. 
In  this  case,  if  the  number  is  even  a  one  is  appendend  in  the  MSB 
position,  zero  otherwise.  */ 


if  (sum%2  =====  0) 
bit(23|  =  1; 
else 

bit(23]  =  0; 

for  (j  =  23;  j  >=  0;  j— ) 
fprintf  (gp,"  ^d”,  bitfj)); 
fprintf  (gp,”0); 

fscanf  (fp,”?i&ld”,  &k);  /*  get  next  number  */ 


} 


/*  end  while  */ 
/*  end  main  */ 
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** 

** 

** 

** 

** 

** 

** 

** 

** 

** 

** 

** 

** 


********************************************************************* 
DATE:  2  AUG  1985 

TITLE:  Binary  to  Decimal  Conversion  Program 
FILENAME:  forml6.c 
COORDINATOR:  Jim  Collins. 

PROJECT:  WFT16  SIMULATION 

USE:  This  program  takes  the  16  serial  outputs  per  clock 
cycle,  for  any  column,  and  converts  it  from  a  veritcal 
format  to  a  horizontal  display.  It  also  converts  the 
binary  output  stream  into  a  decimal  number.  There  is  a 
family  of  these  programs,  one  for  all  possible  number 
of  outputs  for  each  column 


********************************************************************** 
^include  <stdio.h> 

#define  clk_cycles  32  /*  number  used  within  one  file*/ 
main  () 

{ 

FILE  *ap,  *bp,  *cp,  *fopen(); 


int  i,  j,  flag,  cycles; 
int  reformat(l7][32]; 
unsigned  long  sum; 
int  count; 


ap  =  fopen(”master_control”,  ”r"j; 
bp  =  fopen^injormat”,  ”r”); 
cp  =  fopen(’,out_for:nat,\  ”w”); 

fscanf(ap,,T<7i6d'\  &cycles); 
count  =  cycles  /  clk_cycles; 
while  (count  >  0) 

{ 

for  (j  =  0;  j  <  =  31;  j+-r) 
for  (i  =  0;  i  <  =  16  ;  in — r-) 
fscanf(bp,”°2id” ,  &reformat(ij[j[); 

for  (i  =  0;  i  <  =  16;  i+*) 

{ 

for  (j  =  0;  j  <  =  31;  j--) 
fprintf(cp,"  ^cd  reformatliijjj); 
fprintf(cp,’0); 

} 

*  convert  the  bit  streams  into  decimal,  if  the  MSB  is  a  1  the 
result  is  negative,  the  program  handles  this  in  the  same 
manner  as  normal  two’s  complement  conversion.  * 

for  (i  --  0;  i  ■  =  16; 

{  sum  --  0; 

for  (j  —  0;  j  -  3 1 :  j  ‘  - ) 

{  *  the  first  entry  is  the  clock  tag  * 
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else  if  (reformat[i][31j  ==  1) 

{ 

sum  =  sum  -r  (!(reformat[i][jj)  <  <  j  ); 
flag  =  1;} 

else 

{sum  =  sum  +  (reformat[i][j)  <  <  j  ); 
flag  =  0;} 

} 

if  (flag  ==  1) 

{  sum  =  sum  +1; 

printf(”  [ %d\  =  -%d” ,  (i-1),  sum);} 
else 

printf(  ”  [%d\  =  %d  0,  (i-1),  sum); 

} 

count  =  count  -  1; 

}  /*  end  while  */ 

} 
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Continuing  advances  in  the  state  of  the  art  silicon  fabriaction 
technology  have  allowed  tremendous  increases  in  the  perform¬ 
ance  which  may  be  achieved  by  a  single  integrated  circuit.  The 
natural  counterpart  of  this  increased  functionality  is,  of 
course,  increased  design  complexity.  A  growing  problem  is 
how  to  concisely  and  accurately  communicate  design  information 
on  VLSI  and  VHSIC  class  circuits. 

The  VHSIC  program  office  has  sponsored  the  development  of 
a  hardware  description  language  designed  to  address  this  prob¬ 
lem.  The  VHSIC  Hardware  Description  Language  (VHDL)  was 
applied  to  the  problem  of  modeling  a  custom  signal  processor 

employing  the  V.'inograd  Fourier  Transform.  A  methodology  was 
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er.ts ,  and  then  models  the  behavior  and  structure  of  the  macro- 
cells  which  which  comprise  those  subcomponents.  Additionaly, 
a  custom  simulator  was  developed  to  verify  the  timing,  control, 
and  hardware  macrocells  used  in  the  implementation  of  the 
signal  processor.  The  simulation  modeled  the  circuit  at  the 
bit  level  and  validated  the  architecture  and  expected  numerical 
performance.  (~rt  •>  \ 
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